How Hard Is Computing the Edit Distance?1

Total Page:16

File Type:pdf, Size:1020Kb

How Hard Is Computing the Edit Distance?1 Information and Computation 165, 1–13 (2001) doi:10.1006/inco.2000.2914, available online at http://www.idealibrary.com on How Hard Is Computing the Edit Distance?1 Giovanni Pighizzini Dipartimento di Scienze dell’Informazione, Universita` degli Studi di Milano, via Comelico 39, 20135 Milan, Italy E-mail: [email protected] Received October 26, 1995 The notion of edit distance arises in very different fields such as self-correcting codes, parsing theory, speech recognition, and molecular biology. The edit distance between an input string and a language L is the minimum cost of a sequence of edit operations (substitution of a symbol in another incorrect symbol, insertion of an extraneous symbol, deletion of a symbol) needed to change the input string into a sentence of L. In this paper we study the complexity of computing the edit distance, discovering sharp boundaries between classes of languages for which this function can be efficiently evaluated and classes of languages for which it seems to be difficult to compute. Our main result is a parallel algorithm for computing the edit distance for the class of languages accepted by one-way nondeterministic auxiliary pushdown automata working in polynomial time, a class that strictly contains context–free languages. Moreover, we show that this algorithm can be extended in order to find a sentence of the language from which the input string has minimum distance. C 2001 Academic Press Key Words: formal languages; computational complexity; string correction; error correction; edit distance; dynamic programming. 1 INTRODUCTION Consider a language L 6 and consider receiving as input, strings that represent instances of L, but possibly containing errors (due, for example, to faulty data entry or to noise in the communication channel). We want to find the original sentences of L from which it is most likely the strings we have received were originated. In order to find such sentences, we adopt a minimum distance criterion; i.e., given an input string x, we are interested in determining a sentence y ∈ L from which x has minimal edit distance. The edit distance between two strings is the minimum cost of a sequence of edit operations (i.e., the substitution of a symbol by an incorrect symbol, the insertion of an extraneous symbol, and the deletion of a symbol) that transform one string into another. Thus, the edit distance of a given language L is the function which associates with any input string x the minimal distance between strings belonging to L and x. The notion of edit distance is due to Levenshtein [20]. However, it is not easy to exactly date the concept of string comparison. Among former works, we recall Damerau [12], Alberga [3], Morgan [23], Wagner [29]. Concerning applications of this notion, we briefly recall that edit distances were initially studied in code theory (Levenshtein [20] and Ullman [27]), parsing theory (Aho and Peterson [2], Lyon [21], and Pighizzini [25]), and speech recognition (Okuda et al. [24] and Ackroyd [1]). Interesting combinatorial problems on strings, as for example the computation of longest common subsequences (Apostolico and Guerra [7]), can be reduced to edit distances. Very recently they regained a certain popularity due to their connections with problems in molecular biology (Eppstein et al. [13] and Karp [18]). (For an extended overview of the history and applications of edit distances, we refer the reader to Kruskal [19].) In this paper, we are interested in establishing general results about the complexity of computing edit distances between languages and strings. In particular, we want to delineate the boundaries among classes of languages for which at least a strong evidence can be given on the hardness of edit distances and classes for which edit distances can be efficiently computed. The results we obtain are quite tight. More precisely, in Section 3, we show that P = PSPACE if and only if the edit distance of each 1 Partially supported by Ministero dell’Universit`ae della Ricerca Scientifica e Tecnologica, under the project “Modelli di calcolo innovativi: metodi sintattici e combinatori.” A preliminary version of this work appeared in Fundamentals of Computation Theory (FCT’95), Proceedings, Lecture Notes in Computer Science, Vol. 965, pp. 383–392, Springer-Verlag, Berlin/NewYork, 1995. 1 0890-5401/01 $35.00 Copyright C 2001 by Academic Press All rights of reproduction in any form reserved. 2 GIOVANNI PIGHIZZINI context-sensitive language is computable in polynomial time, and P = NP if and only if the edit distance of each language belonging to the class co-NTIME(log n) (a small subclass of AC0) is computable in polynomial time. On the other hand, if we restrict our attention to one-way devices, edit distance becomes feasible. In fact, our main result, proved in Section 4, states that the edit distance of languages accepted by one-way nondeterministic auxiliary pushdown automata (i.e., logspace bounded Turing machines augmented with a pushdown store) working in polynomial time can be efficiently computed. (This class properly contains the context-free languages.) In general people are not really interested in computing edit distances, but rather in finding a sentence with minimal edit distance from the input string (correction problem). In Section 3, we consider the correction problem and we state hardness results similar to those proved for the edit distance. On the other hand, in Section 5, we show that in the case of languages accepted by one-way nondeterministic auxiliary pushdown automata working in polynomial time, the correction problem can be solved in the same bounds given in Section 4 for computing the edit distance. This paper is a continuation of a stream of research whose aim is that of identifying properties other than the membership which are easily computable for certain classes of languages (see, e.g., Huynh [15, 16] and Allender et al. [4–6]). 2 PRELIMINARY DEFINITIONS AND RESULTS Given a finite alphabet 6, we denote by 6 the free monoid of strings over 6 and by the empty string. Given a string x ∈ 6 , we will denote by |x| its length and by xi the ith symbol of x, i = 1,...,n. Now, we briefly recall the notion of edit distance between two strings and, subsequently, that of edit distance between a language and a string. An edit operation is a pair (a, b) 6= (, ), where |a|, |b|1, and is usually denoted as a → b. A string y can be derived from a string x applying the edit operation → a → b, in symbols x =⇒a b y ,ifx =vawand y = vbw, for suitable strings v,w.Fora,b ∈6,a 6= b, the operations a → b, a → , and → a are called substitution, deletion, and insertion, respectively. Given an edit sequence S = (s1,...,sm), namely a sequence of edit operations, and two strings x, y ∈ 6 , we say that y can be derived from x applying the sequence S, in symbols x =⇒S y , if there exist strings si z0, z1,...,zm such that z0 = x, zm = y, and, for i = 1,...,m,zi1 =⇒ z i . With each edit operation we associate a nonnegative cost. More precisely, we consider a cost function where (a → b) is the cost of the edit operation a → b. Then, the cost of an edit sequence S can be defined as the sum of costs of operations in S. DEFINITION 2.1. Given two strings x, y ∈ 6 , the edit distance d(x, y) from x to y is the minimum of the costs of edit sequences which transform x into y. Given a string x ∈ 6 and a language L 6, the edit distance from L to x is the minimum among the costs of edit sequences which transform strings of L into x, i.e., d(L, x) = min{d(y, x) | y ∈ L}, with the convention that d(∅, x) =∞. For a fixed language L 6, the function d(L, x), in the variable x, is called the edit distance of L. We assume that, for a, b, c ∈ 6 ∪{},(a→b)=0 if and only if a = b, and (a → b) + (b → c) (a → c). Note that if is symmetric, then d is a metric on 6. EXAMPLE. Let 6 ={0,1},(a→b)=1 for a, b ∈ 6 ∪{},a 6= b, x = 0011, y = 010, and z = 100. String x can be transformed into y using the sequence S = (0 → 1, 1 → , 1 → 0). In fact: → → → x = 0011 =⇒0 1 0111 =⇒1 011 =⇒1 0 010 = y. The cost of S is 3. However, this is not optimal. In fact, it can be shown that d(x, y) = 2, considering the following sequence: → → x = 0011 =⇒0 011 =⇒1 0 010 = y. Consider now the language L ={0n1n |n 0}. It can be shown that d(L, 100) = 2. A string belonging to L from which 100 has minimal distance is 01. COMPUTING THE EDIT DISTANCE 3 The edit distance between two strings can be easily computed from edit distances between substrings. In particular, it is immediate to prove the following relationships: ∀u,v,w,z ∈6:d(uv,wz) d(u,w)+d(v, z); ∀x0, x00, y ∈ 6: d(x0x00, y) = min{d(x0, y0) + d(x00, y00) | y = y0 y00}. Now, we show two properties of edit distances useful to state our results. The first property gives an upper bound for the edit distance from a language L to a string x. The second property states an upper bound for the length of strings belonging to L which minimize the distance. LEMMA 2.1. Given a nonempty language L 6 and a cost function , there exist constants , , 0,0 such that d(L, x) |x|+and |y|0|x|+0 for each x ∈ 6 and for each y ∈ L such that d(L, x) = d(y, x).
Recommended publications
  • CS601 DTIME and DSPACE Lecture 5 Time and Space Functions: T, S
    CS601 DTIME and DSPACE Lecture 5 Time and Space functions: t, s : N → N+ Definition 5.1 A set A ⊆ U is in DTIME[t(n)] iff there exists a deterministic, multi-tape TM, M, and a constant c, such that, 1. A = L(M) ≡ w ∈ U M(w)=1 , and 2. ∀w ∈ U, M(w) halts within c · t(|w|) steps. Definition 5.2 A set A ⊆ U is in DSPACE[s(n)] iff there exists a deterministic, multi-tape TM, M, and a constant c, such that, 1. A = L(M), and 2. ∀w ∈ U, M(w) uses at most c · s(|w|) work-tape cells. (Input tape is “read-only” and not counted as space used.) Example: PALINDROMES ∈ DTIME[n], DSPACE[n]. In fact, PALINDROMES ∈ DSPACE[log n]. [Exercise] 1 CS601 F(DTIME) and F(DSPACE) Lecture 5 Definition 5.3 f : U → U is in F (DTIME[t(n)]) iff there exists a deterministic, multi-tape TM, M, and a constant c, such that, 1. f = M(·); 2. ∀w ∈ U, M(w) halts within c · t(|w|) steps; 3. |f(w)|≤|w|O(1), i.e., f is polynomially bounded. Definition 5.4 f : U → U is in F (DSPACE[s(n)]) iff there exists a deterministic, multi-tape TM, M, and a constant c, such that, 1. f = M(·); 2. ∀w ∈ U, M(w) uses at most c · s(|w|) work-tape cells; 3. |f(w)|≤|w|O(1), i.e., f is polynomially bounded. (Input tape is “read-only”; Output tape is “write-only”.
    [Show full text]
  • The Complexity of Space Bounded Interactive Proof Systems
    The Complexity of Space Bounded Interactive Proof Systems ANNE CONDON Computer Science Department, University of Wisconsin-Madison 1 INTRODUCTION Some of the most exciting developments in complexity theory in recent years concern the complexity of interactive proof systems, defined by Goldwasser, Micali and Rackoff (1985) and independently by Babai (1985). In this paper, we survey results on the complexity of space bounded interactive proof systems and their applications. An early motivation for the study of interactive proof systems was to extend the notion of NP as the class of problems with efficient \proofs of membership". Informally, a prover can convince a verifier in polynomial time that a string is in an NP language, by presenting a witness of that fact to the verifier. Suppose that the power of the verifier is extended so that it can flip coins and can interact with the prover during the course of a proof. In this way, a verifier can gather statistical evidence that an input is in a language. As we will see, the interactive proof system model precisely captures this in- teraction between a prover P and a verifier V . In the model, the computation of V is probabilistic, but is typically restricted in time or space. A language is accepted by the interactive proof system if, for all inputs in the language, V accepts with high probability, based on the communication with the \honest" prover P . However, on inputs not in the language, V rejects with high prob- ability, even when communicating with a \dishonest" prover. In the general model, V can keep its coin flips secret from the prover.
    [Show full text]
  • Lecture 14 (Feb 28): Probabilistically Checkable Proofs (PCP) 14.1 Motivation: Some Problems and Their Approximation Algorithms
    CMPUT 675: Computational Complexity Theory Winter 2019 Lecture 14 (Feb 28): Probabilistically Checkable Proofs (PCP) Lecturer: Zachary Friggstad Scribe: Haozhou Pang 14.1 Motivation: Some problems and their approximation algorithms Many optimization problems are NP-hard, and therefore it is unlikely to efficiently find optimal solutions. However, in some situations, finding a provably good-enough solution (approximation of the optimal) is also acceptable. We say an algorithm is an α-approximation algorithm for an optimization problem iff for every instance of the problem it can find a solution whose cost is a factor of α of the optimum solution cost. Here are some selected problems and their approximation algorithms: Example: Min Vertex Cover is the problem that given a graph G = (V; E), the goal is to find a vertex cover C ⊆ V of minimum size. The following algorithm gives a 2-approximation for Min Vertex Cover: Algorithm 1 Min Vertex Cover Algorithm Input: Graph G = (V; E) Output: a vertex cover C of G. C ; while some edge (u; v) has u; v2 = C do C C [ fu; vg end while return C Claim 1 Let C∗ be an optimal solution, the returned solution C satisfies jCj ≤ 2jC∗j. Proof. Let M be the edges considered by the algorithm in the loop, we have jCj = 2jMj. Also, jC∗j covers M and no two edges in M share an endpoint (M is a matching), so jC∗j ≥ jMj. Therefore, jCj = 2jMj ≤ 2jC∗j. Example: Max SAT is the problem that given a CNF formula, the goal is to find a assignment that satisfies as many clauses as possible.
    [Show full text]
  • Dictionary Look-Up Within Small Edit Distance
    Dictionary Lo okUp Within Small Edit Distance Ab dullah N Arslan and Omer Egeciog lu Department of Computer Science University of California Santa Barbara Santa Barbara CA USA farslanomergcsucsbedu Abstract Let W b e a dictionary consisting of n binary strings of length m each represented as a trie The usual dquery asks if there exists a string in W within Hamming distance d of a given binary query string q We present an algorithm to determine if there is a memb er in W within edit distance d of a given query string q of length m The metho d d+1 takes time O dm in the RAM mo del indep endent of n and requires O dm additional space Intro duction Let W b e a dictionary consisting of n binary strings of length m each A dquery asks if there exists a string in W within Hamming distance d of a given binary query string q Algorithms for answering dqueries eciently has b een a topic of interest for some time and have also b een studied as the approximate query and the approximate query retrieval problems in the literature The problem was originally p osed by Minsky and Pap ert in in which they asked if there is a data structure that supp orts fast dqueries The cases of small d and large d for this problem seem to require dierent techniques for their solutions The case when d is small was studied by Yao and Yao Dolev et al and Greene et al have made some progress when d is relatively large There are ecient algorithms only when d prop osed by Bro dal and Venkadesh Yao and Yao and Bro dal and Gasieniec The small d case has applications
    [Show full text]
  • Approximate String Matching with Reduced Alphabet
    Approximate String Matching with Reduced Alphabet Leena Salmela1 and Jorma Tarhio2 1 University of Helsinki, Department of Computer Science [email protected] 2 Aalto University Deptartment of Computer Science and Engineering [email protected] Abstract. We present a method to speed up approximate string match- ing by mapping the factual alphabet to a smaller alphabet. We apply the alphabet reduction scheme to a tuned version of the approximate Boyer– Moore algorithm utilizing the Four-Russians technique. Our experiments show that the alphabet reduction makes the algorithm faster. Especially in the k-mismatch case, the new variation is faster than earlier algorithms for English data with small values of k. 1 Introduction The approximate string matching problem is defined as follows. We have a pat- tern P [1...m]ofm characters drawn from an alphabet Σ of size σ,atextT [1...n] of n characters over the same alphabet, and an integer k. We need to find all such positions i of the text that the distance between the pattern and a sub- string of the text ending at that position is at most k.Inthek-difference problem the distance between two strings is the standard edit distance where substitu- tions, deletions, and insertions are allowed. The k-mismatch problem is a more restricted one using the Hamming distance where only substitutions are allowed. Among the most cited papers on approximate string matching are the classical articles [1,2] by Esko Ukkonen. Besides them he has studied this topic extensively [3,4,5,6,7,8,9,10,11].
    [Show full text]
  • Backtrack Parsing Context-Free Grammar Context-Free Grammar
    Context-free Grammar Problems with Regular Context-free Grammar Language and Is English a regular language? Bad question! We do not even know what English is! Two eggs and bacon make(s) a big breakfast Backtrack Parsing Can you slide me the salt? He didn't ought to do that But—No! Martin Kay I put the wine you brought in the fridge I put the wine you brought for Sandy in the fridge Should we bring the wine you put in the fridge out Stanford University now? and University of the Saarland You said you thought nobody had the right to claim that they were above the law Martin Kay Context-free Grammar 1 Martin Kay Context-free Grammar 2 Problems with Regular Problems with Regular Language Language You said you thought nobody had the right to claim [You said you thought [nobody had the right [to claim that they were above the law that [they were above the law]]]] Martin Kay Context-free Grammar 3 Martin Kay Context-free Grammar 4 Problems with Regular Context-free Grammar Language Nonterminal symbols ~ grammatical categories Is English mophology a regular language? Bad question! We do not even know what English Terminal Symbols ~ words morphology is! They sell collectables of all sorts Productions ~ (unordered) (rewriting) rules This concerns unredecontaminatability Distinguished Symbol This really is an untiable knot. But—Probably! (Not sure about Swahili, though) Not all that important • Terminals and nonterminals are disjoint • Distinguished symbol Martin Kay Context-free Grammar 5 Martin Kay Context-free Grammar 6 Context-free Grammar Context-free
    [Show full text]
  • Adaptive LL(*) Parsing: the Power of Dynamic Analysis
    Adaptive LL(*) Parsing: The Power of Dynamic Analysis Terence Parr Sam Harwell Kathleen Fisher University of San Francisco University of Texas at Austin Tufts University [email protected] [email protected] kfi[email protected] Abstract PEGs are unambiguous by definition but have a quirk where Despite the advances made by modern parsing strategies such rule A ! a j ab (meaning “A matches either a or ab”) can never as PEG, LL(*), GLR, and GLL, parsing is not a solved prob- match ab since PEGs choose the first alternative that matches lem. Existing approaches suffer from a number of weaknesses, a prefix of the remaining input. Nested backtracking makes de- including difficulties supporting side-effecting embedded ac- bugging PEGs difficult. tions, slow and/or unpredictable performance, and counter- Second, side-effecting programmer-supplied actions (muta- intuitive matching strategies. This paper introduces the ALL(*) tors) like print statements should be avoided in any strategy that parsing strategy that combines the simplicity, efficiency, and continuously speculates (PEG) or supports multiple interpreta- predictability of conventional top-down LL(k) parsers with the tions of the input (GLL and GLR) because such actions may power of a GLR-like mechanism to make parsing decisions. never really take place [17]. (Though DParser [24] supports The critical innovation is to move grammar analysis to parse- “final” actions when the programmer is certain a reduction is time, which lets ALL(*) handle any non-left-recursive context- part of an unambiguous final parse.) Without side effects, ac- free grammar. ALL(*) is O(n4) in theory but consistently per- tions must buffer data for all interpretations in immutable data forms linearly on grammars used in practice, outperforming structures or provide undo actions.
    [Show full text]
  • Approximate Boyer-Moore String Matching
    APPROXIMATE BOYER-MOORE STRING MATCHING JORMA TARHIO AND ESKO UKKONEN University of Helsinki, Department of Computer Science Teollisuuskatu 23, SF-00510 Helsinki, Finland Draft Abstract. The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mis- matches. Our generalized Boyer-Moore algorithm is shown (under a mild 1 independence assumption) to solve the problem in expected time O(kn( + m – k k)) where c is the size of the alphabet. A related algorithm is developed for the c k differences problem where the task is to find all approximate occurrences of a pattern in a text with ≤ k differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer-Moore algorithm when k = 0. Key words: String matching, edit distance, Boyer-Moore algorithm, k mismatches problem, k differences problem AMS (MOS) subject classifications: 68C05, 68C25, 68H05 Abbreviated title: Approximate Boyer-Moore Matching 2 1. Introduction The fastest known exact string matching algorithms are based on the Boyer- Moore idea [BoM77, KMP77]. Such algorithms are “sublinear” on the average in the sense that it is not necessary to check every symbol in the text. The larger is the alphabet and the longer is the pattern, the faster the algorithm works.
    [Show full text]
  • Sequence Alignment/Map Format Specification
    Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 3 Jun 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 53752fa from that repository, last modified on the date shown above. 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with `@', while alignment lines do not. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. This specification is for version 1.6 of the SAM and BAM formats. Each SAM and BAMfilemay optionally specify the version being used via the @HD VN tag. For full version history see Appendix B. Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII 1 in using the POSIX / C locale. Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax. 1.1 An example Suppose we have the following alignment with bases in lowercase clipped from the alignment. Read r001/1 and r001/2 constitute a read pair; r003 is a chimeric read; r004 represents a split alignment. Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG +r002 aaaAGATAA*GGATA +r003 gcctaAGCTAA +r004 ATAGCT..............TCAGC -r003 ttagctTAGGC -r001/2 CAGCGGCAT The corresponding SAM format is:2 1Charset ANSI X3.4-1968 as defined in RFC1345.
    [Show full text]
  • Lexing and Parsing with ANTLR4
    Lab 2 Lexing and Parsing with ANTLR4 Objective • Understand the software architecture of ANTLR4. • Be able to write simple grammars and correct grammar issues in ANTLR4. EXERCISE #1 Lab preparation Ï In the cap-labs directory: git pull will provide you all the necessary files for this lab in TP02. You also have to install ANTLR4. 2.1 User install for ANTLR4 and ANTLR4 Python runtime User installation steps: mkdir ~/lib cd ~/lib wget http://www.antlr.org/download/antlr-4.7-complete.jar pip3 install antlr4-python3-runtime --user Then in your .bashrc: export CLASSPATH=".:$HOME/lib/antlr-4.7-complete.jar:$CLASSPATH" export ANTLR4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias antlr4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias grun='java org.antlr.v4.gui.TestRig' Then source your .bashrc: source ~/.bashrc 2.2 Structure of a .g4 file and compilation Links to a bit of ANTLR4 syntax : • Lexical rules (extended regular expressions): https://github.com/antlr/antlr4/blob/4.7/doc/ lexer-rules.md • Parser rules (grammars) https://github.com/antlr/antlr4/blob/4.7/doc/parser-rules.md The compilation of a given .g4 (for the PYTHON back-end) is done by the following command line: java -jar ~/lib/antlr-4.7-complete.jar -Dlanguage=Python3 filename.g4 or if you modified your .bashrc properly: antlr4 -Dlanguage=Python3 filename.g4 2.3 Simple examples with ANTLR4 EXERCISE #2 Demo files Ï Work your way through the five examples in the directory demo_files: Aurore Alcolei, Laure Gonnord, Valentin Lorentz. 1/4 ENS de Lyon, Département Informatique, M1 CAP Lab #2 – Automne 2017 ex1 with ANTLR4 + Java : A very simple lexical analysis1 for simple arithmetic expressions of the form x+3.
    [Show full text]
  • Problem Set 7 Solutions
    Introduction to Algorithms November 18, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 25 Problem Set 7 Solutions Problem 7-1. Edit distance In this problem you will write a program to compute edit distance. This problem is mandatory. Failure to turn in a solution will result in a serious and negative impact on your term grade! We advise you to start this programming assignment as soon as possible, because getting all the details right in a program can take longer than you think. Many word processors and keyword search engines have a spelling correction feature. If you type in a misspelled word x, the word processor or search engine can suggest a correction y. The correction y should be a word that is close to x. One way to measure the similarity in spelling between two text strings is by “edit distance.” The notion of edit distance is useful in other fields as well. For example, biologists use edit distance to characterize the similarity of DNA or protein sequences. The edit distance d(x; y) of two strings of text, x[1 : : m] and y[1 : : n], is defined to be the minimum possible cost of a sequence of “transformation operations” (defined below) that transforms string x[1 : : m] into string y[1 : : n].1 To define the effect of the transformation operations, we use an auxiliary string z[1 : : s] that holds the intermediate results. At the beginning of the transformation sequence, s = m and z[1 : : s] = x[1 : : m] (i.e., we start with string x[1 : : m]).
    [Show full text]
  • Probabilistic Proof Systems: a Primer
    Probabilistic Proof Systems: A Primer Oded Goldreich Department of Computer Science and Applied Mathematics Weizmann Institute of Science, Rehovot, Israel. June 30, 2008 Contents Preface 1 Conventions and Organization 3 1 Interactive Proof Systems 4 1.1 Motivation and Perspective ::::::::::::::::::::::: 4 1.1.1 A static object versus an interactive process :::::::::: 5 1.1.2 Prover and Veri¯er :::::::::::::::::::::::: 6 1.1.3 Completeness and Soundness :::::::::::::::::: 6 1.2 De¯nition ::::::::::::::::::::::::::::::::: 7 1.3 The Power of Interactive Proofs ::::::::::::::::::::: 9 1.3.1 A simple example :::::::::::::::::::::::: 9 1.3.2 The full power of interactive proofs ::::::::::::::: 11 1.4 Variants and ¯ner structure: an overview ::::::::::::::: 16 1.4.1 Arthur-Merlin games a.k.a public-coin proof systems ::::: 16 1.4.2 Interactive proof systems with two-sided error ::::::::: 16 1.4.3 A hierarchy of interactive proof systems :::::::::::: 17 1.4.4 Something completely di®erent ::::::::::::::::: 18 1.5 On computationally bounded provers: an overview :::::::::: 18 1.5.1 How powerful should the prover be? :::::::::::::: 19 1.5.2 Computational Soundness :::::::::::::::::::: 20 2 Zero-Knowledge Proof Systems 22 2.1 De¯nitional Issues :::::::::::::::::::::::::::: 23 2.1.1 A wider perspective: the simulation paradigm ::::::::: 23 2.1.2 The basic de¯nitions ::::::::::::::::::::::: 24 2.2 The Power of Zero-Knowledge :::::::::::::::::::::: 26 2.2.1 A simple example :::::::::::::::::::::::: 26 2.2.2 The full power of zero-knowledge proofs ::::::::::::
    [Show full text]