A Comparison of Approximate String Matching Algorithms
Corresponding Author
PETTERI JOKINEN
Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland
Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, FinlandSearch for more papers by this authorJORMA TARHIO
Search for more papers by this authorESKO UKKONEN
Search for more papers by this authorCorresponding Author
PETTERI JOKINEN
Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland
Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, FinlandSearch for more papers by this authorJORMA TARHIO
Search for more papers by this authorESKO UKKONEN
Search for more papers by this authorAbstract
Experimental comparisons of the running time of approximate string matching algorithms for the k differences problem are presented. Given a pattern string, a text string, and an integer k, the task is to find all approximate occurrences of the pattern in the text with at most k differences (insertions, deletions, changes). We consider seven algorithms based on different approaches including dynamic programming, Boyer–Moore string matching, suffix automata, and the distribution of characters. It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.
References
- 1
Z. Galil and
R. Giancarlo,
‘Data structures and algorithms for approximate string matching’,
Journal of Complexity
4,
33–72
(1988).
10.1016/0885-064X(88)90008-8 Google Scholar
- 2
P. Sellers,
‘The theory and computation of evolutionary distances: pattern recognition’,
Journal of Algorithms
1,
359–372
(1980).
10.1016/0196-6774(80)90016-4 Google Scholar
- 3 E. Ukkonen, ‘Finding approximate patterns in strings’, Journal of Algorithms 6, 132–137 (1985).
- 4 G. Landau and U. Vishkin, ‘Fast string matching with k differences’, Journal of Computer and System Sciences 37, 63–78 (1988).
- 5 G. Landau and U. Vishkin, ‘Fast parallel and serial approximate string matching’, Journal of Algorithms 10, 157–169 (1989).
- 6 Z. Galil and K. Park, ‘An improved algorithm for approximate string matching’, SIAM Journal on Computing 19, 989–999 (1990).
- 7 E. Ukkonen and D. Wood, ‘Approximate string matching with suffix automata’, Algorithmica 10, 353–364 (1993).
- 8 J. Tarhio and E. Ukkonen, ‘ Boyer-Moore approach to approximate string matching’, in J. Gilbert and R. Karlson (eds), SWAT90, 2nd Scandinavian Workshop on Algorithm Theory, Lecture Notes in Computer Science 447, Springer-Verlag, Berlin, 1990, pp. 348–359.
- 9 J. Tarhio and E. Ukkonen, ‘Approximate Boyer-Moore string matching’, SIAM Journal on Computing 22, 243–260 (1993).
- 10 E. Ukkonen, ‘Approximate string matching with q-grams and maximal matches’, Theoretical Computer Science 92, 191–211 (1992).
- 11 W. Chang and E. Lawler, ‘Sublinear approximate string matching and biological applications’, Algorithmica 12, 327–344 (1994).
- 12 T. Takaoka, ‘ Approximate pattern matching with samples’, Proceedings of ISAAC ′94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, pp. 234–242.
- 13 E. Sutinen and J. Tarhio, ‘ On using q-gram locations in approximate string matching’, P. Spirakis (ed.), Proc. 3rd Annual European Symposium on Algorithms ESA ′95, Lecture Notes in Computer Science 979, Springer, Berlin, 1995, pp. 327–340.
- 14 W. Chang and T. Marr, ‘ Approximate string matching and local similarity’, in M. Crochemore and D. Gusfield (eds), Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium, Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, pp. 259–273.
- 15 W. Chang and J. Lampe, ‘ Theoretical and empirical comparisons of approximate string matching algorithms’, in A. Apostolico, M. Gochemore, Z. Galil and U. Manter (eds), Combinatorial Pattern Matching, Proceedings of Third Annual Symposium, Lecture Notes in Computer Science 644, Springer-Verlag, Berlin, 1992, pp. 175–184.
- 16 S. Wu and U. Manber, ‘Fast text searching allowing errors’, Communications of ACM 35, 83–91 (1992).
- 17 R. Baeza-Yates and C. Perleberg, ‘ Fast and practical approximate string matching algorithms’, in A. Apostolico, M. Gochemore, Z. Galil and U. Manter. (eds), Combinatorial Pattern Matching, Proceedings of Third Annual Symposium, Lecture Notes in Computer Science 644, Springer-Verlag, Berlin, 1992, pp. 185–192.
- 18 R. Grossi and F. Luccio, ‘Simple and efficient string matching with k mismatches’, Information Processing Letters 33, 113–120 (1989).
- 19 R. Baeza-Yates and G. Gonnet, ‘A new approach to text searching’, Communications of ACM 35, 74–82 (1992).
- 20 P. Pevzner and M. Waterman, ‘Multiple filtration and approximate pattern matching’, Algorithmica 13, 135–154 (1995).
- 21 R. Wagner and M. Fischer, ‘The string-to-string correction problem’, Journal of the ACM 21, 168–173 (1975).
- 22 E. Ukkonen, ‘Algorithms for approximate string matching’, Information Control 64, 100–118 (1985).
- 23 M. Crochemore, ‘ String matching with constraints’, 13th Symposium on Mathematical Foundations of Computer Science, Lecture Notes in Computer Science 324, Springer-Verlag, Berlin, 1988, pp. 44–58.
- 24 A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen and J. Seiferas, ‘The smallest automaton recognizing the subwords of a text’, Theoretical Computer Science 40, 31–55 (1985).
- 25 R. Boyer and S. Moore, ‘A fast string searching algorithm’, Communications of the ACM 20, 762–772 (1977).
- 26 N. Horspool, ‘Practical fast searching in strings’, Software - Practice and Experience 10, 501–506 (1980).