<<

Boyer-Moore strategy to efficient approximate string matching Nadia El Mabrouk, Maxime Crochemore

To cite this version:

Nadia El Mabrouk, Maxime Crochemore. Boyer-Moore strategy to efficient approximate string match- ing. Combinatorial (Labuna Beach, California, 1996), 1996, France. pp.24-38. ￿hal-00620020￿

HAL Id: hal-00620020 https://hal-upec-upem.archives-ouvertes.fr/hal-00620020 Submitted on 26 Mar 2013

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

BoyerMoore strategy to ecient approximate

string matching

Nadia ElMabrouk and Maxime Cro chemore

IGM Universite Marne la Vallee

rue de la Butte Verte Noisy Le Grand Cedex

Abstract We prop ose a simple but ecient algorithm for searching all

o ccurrences of a pattern or a class of patterns length m in a text length

n with at most k mismatches

This algorithm relies on the ShiftAdd algorithm of BaezaYates and

Gonnet which involves representing by a bit number the current

state of the search and uses the ability of programming languages to

handle bit words State representation should not therefore exceeds the

word size that is mdlog k e This algorithm consists

2

in a prepro cessing step and a searching step It is linear and p erforms n

op erations during the searching step

Notions of shift and character skip found in the BoyerMoore BM

approach are introduced in this algorithm Provided that the considered

alphab et is large enough compared to the Pattern length the average

number of op erations p erformed by our algorithm during the searching

k +4

step b ecomes n

mk

Introduction

Our purp ose is approximate matching of a pattern or a class of patterns in a text

all sequences of characters or classes of characters from a nite alphab et Er

rors considered here are mismatches A class of patterns is a set of patterns with

dont care symbols patterns containing the complementary of a character or any

other class of characters Such a problem has a lot of applications in particular

in molecular biology for predicting p otential nuclear geneco ding sequences in

genomic DNA sequences In fact exact string matching is not sucient since

geneco ding sequences are in general only partially and approximately sp ecied

Concerning exact string matching algorithms based on the BoyerMoore

BM approach are the fastest in practice Such algorithms are linear

and may even have a sublinear b ehaviour in the sense that every character in

the text need not b e checked In certain cases text characters can b e skipp ed

without missing a pattern o ccurrence The larger the alphab et and the longer

the pattern the faster the algorithm works

Various algorithms have b een developed for searching with k mismatches all

o ccurrences of a pattern length m in a text length n b oth dened over an

alphab et length c Running times have ranged from O mn for the naive

algorithm to O k n or O n log m The rst two algorithms consist

in a prepro cessing step and a searching step Grossi and Luccio algorithm

uses the sux tree Other algorithms have used the BM approach in approximate

string matching Running times are O k n for BaezaYates and Gonnet

k 1

for Tarhio and Ukkonen The problem of approximate and O k n

mk c

matching of a class of patterns was also studied esp ecially in the case

of patterns with dont care symbols Fisher et Paterson

2

developed an O n log c log m log log m time algorithm based on the linear

pro duct Abrahamson extended this metho d for generalized string pattern

Pinter has used the Aho and Corasick automaton for searching a set

of patterns Other algorithms have considered the problem of exact matching

of patterns with variable length dont cares As for Akutsu he

p

2

m m

developed an O log log time algorithm for searching a k m n log c log

k k

pattern with dont cares in a text with dont cares

In several new algorithms for approximate string matching were pub

lished They combine b oth sp eed and programming practicality in

contrast with older results most of which b eing mainly of theoretical interest

Moreover they are exible enough to allow searching for a class of patterns

These algorithms consist in a pattern prepro cessing step and a searching step

They are all based on the same approach consisting in nding at a given p osi

tion in the text all approximate pattern prexes ending at this p osition Sp eed

is increased by representing the state of the search as a bit number or

an array and by using the ability of programming languages to handle bit

words

Nevertheless these algorithms are based on a naive approach and pro cess

each character of the text Our goal is to sp eed up searching by using a BM

strategy and including notions of shift and character skip

We have chosen to consider such an improvement in the case of the Shift

Add algorithm of BaezaYates and Gonnet The main idea of ShiftAdd is

to represent the state of the search as a bit number and p erform a few simple

arithmetic and logical op erations Provided that representations dont exceed

the word size that is mdlog k e each search step do es exactly

2

a shift a test and an addition Therefore this algorithm runs in O n time and

the searching step do es n op erations We developed an algorithm combining the

practicality of the ShiftAdd metho d and the sp eed of the BM approach Provided

that the considered alphab et is large enough compared to m our new algorithm

 

k +4

op erations during the searching step p erforms on average n

mk

The pap er is organized as follows Section summarises the algorithm Shift

Add in the case of exact or approximate matching of a pattern or a class of

patterns Section develops the adaptation of the BM approach to the Shift

Add metho d An improvement of this last algorithm is given in Section Finally

section gives exp erimental results obtained with b oth algorithms

ShiftAdd Algorithm

Let P p p b e a pattern and t t t b e a text over a nite alphab et

1 m 1 n

The problem is to nd in t all o ccurrences of P with at most k mismatches

k m In other words the b etween two patterns of the same length

will b e dened as the number of their mismatching characters the Hamming

distance An equivalent problem is then to nd in t all substrings of length m

such that the Hamming distance b etween these substrings and P is at most k

that is to nd all j p ositions in the text such that for i m p t

i j m+i

except for at most k indices

The main idea is to represent the state of the search as a vector of size

m Thus S denotes the state vector given a current p osition j in the text

j

S contains individual states of the search b etween each prex of P and the

j

corresp onding substring of t Namely for i m S i is the number of

j

mismatches b etween p p and t t

1 i j i+1 j

P matches at j if and only if S m k

j

When t is read the number of mismatches for each prex of P needs to

j +1

b e completed Values of b o olean expressions t p for i m can b e

j +1 i

computed during a prepro cessing step For each character a in a vector T of

a

size m is constructed such that

if a p

i

For i i m T i (1)

a

otherwise

it is sucient to construct the T arrays only for characters app earing in the

pattern

i Finally S i S i T

j +1 j t

j +1

In order to obtain S from S by simple arithmetic and logical op erations

j +1 j

b

vectors are considered as numbers and represented in base where b is the bit

number needed to represent each vector comp onent

m m

X X

(i1)b (i1)b

T i S i and T Thus S

a j a j

i=1 i=1

Representations should not exceed the word size namely mb

It is easy now to verify that the transition from S to S amounts to no

j j +1

more than a left shift denoted by of b bits and an addition

(2) S S b T

j +1 j t

j +1

(m1)b

Initial state is S P matches at j if and only if S k

0 j

Possible values of the vector state comp onents are m Thus to rep

resent each comp onent b dlog m e bits are required However since we

2

need only to compare the number of mismatches with k it is enough to represent

values from to k In this case one more bit is needed for carrying over addi

tions The improved algorithm uses b dlog k e bits At each p osition

2

j in the text the overow bits are recorded in an overow state R and the

j

overow bits of S are reset

j

The ShiftAdd algorithm works in O n time and the searching step disre

garding the overow state p erforms n op erations In fact at each step that is

for each p osition j three op erations are p erformed one shift one addition and

one test to determine whether P matches at p osition j

Exact string matching

In the case of exact string matching it is only necessary to know whether a

given prex of P matches exactly the considered substring of t We dene S as

j

follows



if p    p t    t

1 i j i+1 j

For  i  m S i

j

otherwise

When t is read we need to determine whether t can extend any of the

j +1 j +1

partial matches Thus in order to have a match of p    p at p osition j

1 i

b oth S i and t p should b e satised Here b and in formula

j j +1 i

the symbol should b e replaced by an OR op eration The algorithm based

on this new formula is called ShiftOr

Extensions

Flexibility is one of the principal advantages of the ShiftAdd metho d It can

b e easily adapted to a class of patterns A class of patterns is a set of patterns

dened by a string in which each p osition is a set of characters A set of characters

is for example a subset of or the complementary of a subset of A pattern

class dened by a string in which each p osition is either a single character or the

whole alphab et is called pattern with dont care symbols

To take into account such classes only the denition of the T array needs

changing for a p osition i in P and a character a in T i will contain

a

if a b elongs to the set of characters corresp onding to that p osition in P and

otherwise Thus the T array computed during the prepro cessing step contains all

needed information ab out the pattern Then the searching step is not mo died

BoyerMoore Approach to ShiftAdd metho d

Here we consider the problem of searching all o ccurrences of a pattern string

P p    p in a text string t t    t with at most k mismatches  k  m

1 m 1 n

The problem of exact string matching can b e solved by substituting the OR

op eration to the add op eration In the case of string matching with classes the

T array is mo died as in section

b

For some p osition j the state S is a bit number represented in base

j

dened as previously each individual state S i for  i  m contains the

j

number of mismatches b etween p    p and t    t Here the introduction

1 i j m+1 j

of the overow state is ignored

Shift

Our goal is to avoid pro cessing each character of the text in other words avoid

computing S for every p osition j It is easy to see that if for some prex of

j

length i of P S i k then since S m  S i P will not o ccur at

j j

j +(mi)

p osition j m i The following prop osition can b e deduced

Prop osition For some position j in t let l be the largest index i i

m such that S i k if such an index exists and otherwise Let d m l

j

Then the next position after j where P is likely to occur is j j d In

next

0 0

0

m k for every j such that j j j other words S

j

next

d is the next shift and d m k

Consider now the transition b etween S and S

j j +d

The number of mismatches b etween the prex p p of P for d i m

1 i

and the substring t t of t ie S i is the sum of the number of

j +di+1 j +d j +d

mismatches b etween p p and t t ie S i d and the number

1 id j +di+1 j j

of mismatches b etween p p and t t

id+1 i j +1 j +d

The T array dened in contains the information ab out the o ccurrence of

a given character a at a given p osition in the pattern

Consequently

d1

X

S i d T i r if d i m

j t

j +dr

r =0

S i

j +d

i1

X

T i r otherwise

t

j +dr

r =0

b

In order to obtain S as a sum of numbers in base the next denition

j +d

is needed

Denition D denotes the j j m matrix such that element D am r for

each a and r m is denoted by D and dened as follows

amr

m

X

(i1)b

T i r D

a amr

i=r +1

Intuitively D denotes p ositions in p p containing character a

amr 1 mr

For a xed r D is obtained by a left shift of T of r b p ositions

amr a

S can then b e represented as follows

j +d

d1

X

S S bd D

j +d j t mr

j +dr

r =0

Initial values are S and d m

0

Practically in order to determine the shift d S is shifted b bits at a time

j

(m1)b

until the obtained number is b elow k S will have nally b een shifted d

j

times to obtain S bd Therefore shifts are not group ed

j

Example Let fa b c dg P abbac and k

The D matrix is

a b c d

Successive states and shifts when searching P in t with at most k mismatches

Positions

t a b d a b b a b b a c

S

j

d

Remark Introduction of shifts do es not improve the complexity of the

ShiftAdd algorithm It only has the eect of grouping additions and

tests However shifts are essential to introduce the notion of characters

skip which will nally sp eed up the algorithm

Character skip

The BoyerMoore BM algorithm is an ecient exact string matching algorithm

It is fast since it is p ossible in certain conditions to skip substrings of the text

that is not pro cess them without loss of information At each step characters

of the text are pro cessed from right to left

In this section we try to nd conditions in which parts of the text can b e

avoided without missing o ccurrences of the pattern

Assume j is the last p osition scanned in the text and d is the next shift

The substring of the text still to b e scanned at this step of the search is then

t    t This substring is pro cessed from right to left that is b eginning with

j +1 j +d

t and the pro cessing stops when t is reached or when the information for

j +d j +1

all prexes of P ending at p osition j d is obtained

Practically in order to compute state S S should rst b e shifted on the

j +d j

left of bd bits Let S S bd b e the obtained number Then each of the

j +d0 j

d characters t with  r  d should b e pro cessed Let S b e the

j +dr +1 j +dr

partial state obtained after pro cessing characters t    t of t Then

j +d j +dr +1

S S D and we have S S

j +dr j +dr 1 t mr +1 j +d j +dd

j +dr +1

For given indexes r  r  d and i  i  m

a If S i k then without pro cessing the remaining characters t    t

j +dr j +dr j +1

we know that the prex of length i of P do es not o ccur at p osition j d

b If S i  k and no more comparisons have to b e p erformed for the prex

j +dr

of length i of P then this prex matches at p osition j d

Therefore instead of computing the number of mismatches with the corre

sp onding substring of t for each prex of P ie the terminal state the com

putation stops at the rst partial state giving enough information for further

0 0

are lo b e this partial state Dierences b etween S and S pro cessing Let S

j

j j

cated only in individual states exceeding k

Algorithm We supp ose that the D matrix has b een computed during a pre

pro cessing step For a given p osition j in the text an index r and a pre

x of length i of P State denotes the bit number consisting in individual

states of partial state S for prexes with length from to i More precisely

jr

State S i    S   

jr jr

Algorithm BM approach to approximate string matching

j m d m State

lim k m

While j  n do

i m r

If State  lim then

State State b

i i Go to

Else

If MINd i r then

r r

State State D Go to

t mr

j r

Else

If i m then

Occurrence of P at p osition j

State State b

i i Go to

Else

d m i

j j d

End of While

Prop osition Algorithm nds al l occurrences of the pattern P in the text t

with at most k mismatches

PROOF

Step of the algorithm corresp onds to situation a that is when the

partial number S i of mismatches found at this step of the search for the

jr

prex i of P exceeds k In this case this prex is ignored and the next prex

i of P is considered

Step corresp onds to the situation where there is not enough information

to stop comparing In fact for the prex i of P the number S i of mismatches jr

obtained at this step of the search is less than k but a number of comparisons

remain to b e done for this prex

Step corresp onds to situation b that is when S i k for the

jr

prex i no more comparisons have to b e p erformed for this prex In this case

if i m then p osition j matches and the search for the next shift go es on If

i m then the next shift is equal to m i In fact i corresp onds to the length

of the longest prex of P matching the corresp onding substring of t

Example Let P t and k b e those dened in example

Shifts are the same as for example and only state S is not the terminal state

5

Positions

t a b d a b b a b b a c

S

j

d

Complexity Our goal is to evaluate the average number of op erations p er

formed by Algorithm Op erations are of three kinds shifts additions and tests

Recall that the scanning of t by the ShiftAdd algorithm needs n op erations

since each search step do es exactly a shift an addition and a test It is not dicult

to see that our algorithm do es the same number of shifts n and less additions

In fact one addition is p erformed for each character pro cessed in the text and

not all characters are examined However the number of tests increases since

in addition to those considered by the ShiftAdd algorithm those which make

transitions b etween partial states should b e considered

Let t b e a random text and j j c The probability of a given character to

1

o ccur at a given p osition in the text is then

c

Let X b e a random variable denoting the length of the shift in Algorithm

when searching pattern P in the random text t with at most k mismatches The

following lemma gives the average shift d that is the exp ected value X

m

of the random variable X

Lemma Provided c is large enough compared to m the average shift d ob

m

0

tained by Algorithm exceeds d with

m

!

   

k +1

0

d m k

m

c c

Now we analyze the average number M of characters pro cessed at a given

k

p osition j d of the text where j is the last p osition scanned in the text

m

This number of characters is the length of the smallest substring of t ending at

p osition j d and mismatching all substrings of P which are not prexes The

m

maximum number of characters to b e pro cessed at this step is d

m

From the last remarks we can deduce the following lemma

Lemma Provided c is large enough compared to m M  k

k

We are able now to evaluate the complexity of Algorithm

Prop osition The average number OP of operations performed by Algorithm

k

 

3M +2

k

When the considered alphabet is large enough this number be is n

d

m

 

3k +8

comes OP  n

k

mk

PROOF

Let OP b e the average number of op erations p erformed by Algorithm

d k

m

n

at each step of the search Thus OP OP

k d k

m

d

m

At each step of the search op erations p erformed by Algorithm are M ad

k

ditions one addition p er character d shifts lines and of the algorithm

m

and at most d M tests In fact note rst that condition State  lim

m k

step is true at most d times and in that case we do not pro ceed to step

m

Thus exactly d tests are p erformed in these cases Moreover in order to know

m

the next shift we should go once through step and then do tests and

test could b e avoided by changing the algorithm such that case i m is

examined at a previous step Finally since there are exactly M additions we

k

should go through line exactly M times and at each time do tests and

k

The average number of tests is therefore T d M

d k m k

m

 

3M +2

k

So OP d M and OP n

d k m k k

m

d

m

The case of a large alphab et is deduced from lemmas and 

Improvement

In order to sp eed up the algorithm it is obvious that a way should b e found to

p erform less tests Our idea is to pro cess a certain number of characters at each

step of the search that is do a certain number of additions b efore b eginning

tests

Algorithm

Let j b e the current p osition in the text and d b e the last shift obtained We

denote by C the following number C min d k

d d

C is the average number M lemma of characters pro cessed at each step

d k

by Algorithm provided this number do es not exceed the maximum number of

characters to b e pro cessed at this step that is d

Thus b efore going through steps our improved algorithm Algo

rithm will pro cess rst the C characters t    t and compute the partial

d j C +1 j

d

state S that is do C additions

jC d

d

Algorithm is hence obtained by adding a preliminary step b efore step

in Algorithm

C 1

d

X

State State D

t ml

j l

l=0

r r C

d

Obviously Algorithm nds the same results the same shifts and so the same

average shift d that Algorithm

m

Prop osition Provided the alphabet is large enough the average number of

k +4

operations performed by Algorithm is OP  n

k

mk

PROOF

Our goal is to evaluate the average number OP of op erations p erformed

d k

m

by Algorithm at any search step

First M characters are pro cessed and M additions are p erformed Two

k k

cases are then encountered

The partial state holds enough information so no more characters are pro

cessed at this step In this case d tests and d shifts are p erformed

m m

The partial state do es not hold enough information In this case the max

imum number of characters still to b e p erformed is d M The number

m k

of shifts is the same as that of the previous state and there are d M

m k

more tests

Let P b e the probability of the second case Then OP M d

k d k k m

m

P d M

k m k

and

M P d M

k k m k

OP n

k

d

m

When the considered alphab et is large enough d  m k Lemma M 

m k

k +4

 k Lemma and we can prove that P  Thus OP  n

k k

mk

Exp eriments

Our goal is to nd out under which conditions Algorithm is fastest than algo

rithm ShiftAdd

Algorithm Exact string matching

j m d m State

lim k m initial   

mb mb1

While j  n do

i m r

While State 6 initial and r d do

State State OR D

t mr

j r

rr

End of While

If State initial then

State

d m

Else

If State lim then

Occurrence of P at p osition j

State State b

While State  lim

State State b

i i

End of While

d m i

j j d

End of While

Exact string matching

In this case the considered algorithm is ShiftOr BaezaYates and Gonnet

have introduced the following improvement if at a given p osition j in t S

j

   that is all prexes of P mismatch at p osition j then the next

mb mb1

character pro cessed in the text is p if such a character exists In fact the state

1

remains the same for all other characters

We improve Algorithm as well at a given step of the search characters are

pro cessed until the partial state is equal to    Algorithm

mb mb1

We have exp erimented algorithms ShiftOr and Algorithm on a

character text the french version of the Bible Figure shows the execution

time while searching random patterns from the Bible The rst column of

the table shows the lengths of the considered patterns

We can see that the longest the pattern the fastest Algorithm works More

over for patterns of lengths up to Algorithm is faster than ShiftOr

m ShiftOr Algorithm

Figure Exp erimental results in seconds

for exact searching random patterns in

the Bible First column gives the lengths of

the considered patterns

Approximate string matching

Figure shows exp erimental results for Algorithm and ShiftAdd while search

ing random patterns in the Bible MO with at most or mismatches

When m is large enough compared to k Algorithm is faster than ShiftAdd for

k m should b e larger than and for k larger than Since mb should

not exceed the word size large values of m cannot b e considered For

the eciency of Algorithm is then limited to k 

For longer patterns we need to use more than a word p er number It is

not dicult to extend the algorithm for this case BaezaYates and Gonnet have

noticed that ShiftAdd is still a go o d practical algorithm for string matching with

mismatches and classes provided the number of words p er number is small

Figure shows results in the case of two bit words p er number Notice that

they extend the results in Figure

Conclusion

We have developed an algorithm combining b oth the programming practicality of

the ShiftAdd metho d and the sp eed of the BM approach Flexibility is another

advantage of this algorithm In fact it can b e easily adapted to classes of

patterns

Nevertheless as for the BM algorithm the larger the alphab et and the longer

the pattern the faster our algorithm works For a large alphab et ASCI I co de

k +4

op erations the searching step do es on average n

mk

In some cases it is necessary to consider small alphab ets In particular

in molecular biology when detecting p otential geneco ding sequences in ge

nomic DNA sequences The considered alphab et consists in the four nucleotides

k k

m ShiftAdd Algorithm ShiftAdd Algorithm

157.56 157.24

155.83 156.25

154.05 155.64

148.13 156.33

145.85 150.72

141.67

135.80

129.40

125.28

121.08

117.21

Figure Exp erimental results in seconds for searching

random patterns in the Bible with at most or

mismatches For k and m more than one word

p er number is needed

k k k

m ShiftAdd Algorithme ShiftAdd Algorithme ShiftAdd Algorithme

275.96 282.38

266.48 278.75

250.82 265.46

219.24 238.18 256.98

211.78 227.57 246.60

205.70

198.13

194.20

188.52

Figure Complementary results when mb Two bit words p er number are used

fA C G T g For such alphab ets our algorithm do es n op erations with

provided that the length m of the pattern is very large compared to k

In order to consider large patterns one solution is to use more than a bit

word p er number Moreover BaezaYates and Perleberg BYP have devel

op ed an algorithm for approximate string matching based on the same naive

metho d than for the ShiftAdd algorithm but using arrays instead of numbers

In this case there is no condition on the length of the searched pattern however

the algorithm is slower The main dierence is that BYP considers the number

of matches instead of the number of mismatches The BYP algorithm can b e

adapted from BM in the same way the ShiftAdd was and it is then p ossible to

consider long patterns

References

K Abrahamson Generalized string matching SIAM J Comput

December

A Aho and M Corasick Ecient string matching an aid to bibliographic search

Commun ACM

T Akutsu Approximate string matching with dont care characters In

M Cro chemore and D Guseld editors Lecture Notes in Computer Science vol

th

ume of Combinatorial Pattern Matching Annual Symposium CPM

pages SpringerVerlag

R BaezaYates and GHGonnet Fast string matching with k mismatches Tech

nical Rep ort CS Data Structuring Group September

R BaezaYates and G Gonnet Ecient text searching of regular expressions

th International collo quium on Automata Languages and Programming Stresa

Italy July

R BaezaYates and G Gonnet A new approach to text searching Commun

ACM Octob er

R BaezaYates and C Perleberg Fast and practical approximate string match

ing In Lecture Notes in Computer Science volume of Combinatorial Pattern

th

Matching Annual Symposium CPM pages SpringerVerlag

A Bertossiand and F Logi Parallel string matching with variable length dont

cares Journal of parallel and distributed computing

R Boyer and J Mo ore A fast string searching algorithm Commun ACM

Octob er

M Fischer and M Paterson Stringmatching and other pro ducts In R Karp

editor Complexity of Computation SIAMAMS Proceedings volume pages

American Mathematical So ciety Providence RI

Z Galil and R Giancarlo Improved string matching with k mismatches SIGACT

News

R Grossi and F Luccio Simple and ecient string matching with k mismatches

Inf Proc Letters November

D Knuth J Morris and V Pratt Fast pattern matching in strings SIAM J

Comput June

G Kucherov and M Rusinowitch Matching a set of strings with variable length

dont cares In Z Galil and E Ukkonen editors Lecture Notes in Computer Sci

ence volume of th annual symposiumCPM pages Esp o oFinland

SpringerVerlag July

G Landau and U Vishkin Ecient string matching with k mismatches Theoret

Comput Sci

U Manber and R BaezaYates An algorithm for string matching with a sequence

of dont cares Information Proceeding Letters

R Pinter Ecient string matching whith dontcare patterns In A Ap ostolico

and Z Galil editors Combinatorial Algorithms on Words volume F pages

SpringerVerlag

J Tarhio and E Ukkonen Boyermoore approach to approximate string matching

In J R Gilb ert and R G Karlsson editors Lecture Notes in Computer Science

volume of nd Scandinavian Workshop in Algorithmic Theory SWAT pages

Bergen Norway SpringerVerlag July

S Wu and U Manber Fast text searching allowing errors Commun ACM

Octob er

A

This article was pro cessed using the L T X macro package with LLNCS style E