1.  String Matching

1.1  definitions

1.2  string matching problem

The obvious solution is to check for X starting at each Y(i) , giving O(nm)

Less obvious solutions by KMP(Knuth, Morris, and Pratt) and BM (Boyer, Moore) are O(m+n) proceed in 2 phases:

  1. analyze X so that a failure to match at point Y(j) starting at Y(i) will allow you to restart the pattern match later in Y than Y(i+1)
  2. Do the pattern match.

1.3  more definitions

1.4  periodicity lemma

If Y has periods p,q with p+q £ m then Y has period GCD(p,q) .

1.4.1 proof:

if p = q done.

else, assume p > q

y has period p-q or Y(i) = Y(i+(p-q)) for 1 £ i £ m-(p-q)

  1. Y(i) = Y(i+p) for 1 £ i £ m-p
  2. Y(i) = Y(i-q) for q+1 £ i £ m
  3. 1+2 Þ Y(i-q) = Y(i+p-q) for q+1 £ i £ m-p+q
  4. 2+3 Þ Y(i) = Y(i-q) = Y(i+(p-q)) for q+1 £ i £ m-(p-q)
  5. You can also get: Y(i) = Y(i+p) = Y(i+p-q) for 1 £ i £ m-p

We just need m-p ³ q to satisfy p+q £ m . Now, apply Euclids theorem to get that Y has period GCD(p,q)

1.5  finding periods

To find the period, P , of Y , just offset Y by P and check for matching overlaps.

1.6  Another lemma:

|Y| = m with minimum period P and Z with |Z| = n ³ m then

  1. if Y occurs in Z at i and j then |i-j| ³ p
  2. if Y occurs in Z at i and i+d where d £ m-p then p divides d .

1.6.1 proof:

if i-j ³ m then we are done.

The rest of the proof is by pictures.

1.7  The witness array

|Y| = m and is of size min{P,m/2}

2.  The algorithm

Duel(i,j)

input: pattern Y plus a witness array and a text, Z with |i-j| £ (p,m/2)

output: return i or j

  1. compare Y to Z starting at Z(i) and Z(j) . Check the witness point.

We have eliminated one element of a pair of patterns in unit time.

3.  text analysis in the nonperiodic case

Input: pattern Y with witness vector, text Z

  1. Divide Z into chunks, Ti of size m/2
  2. In parallel, check if Y starts in Ti using Duel .
  3. Eliminate all but one occurence by using Duel .
  4. Check the last case by brute force.

3.1  Timing analyses

So the total is T(n) = log(n) and W(n) = n


File translated from TEX by TTH, version 1.30.