In this paper we present a streaming algorithm for Pattern Matching with Swaps (Swap Matching).

Abstract

The pattern matching problem with swaps is to find all occurrences of a pattern in a text while allowing the pattern to swap adjacent symbols. The goal is to design fast matching algorithm that takes advantage of the bit parallelism of bitwise machine instructions and has only streaming access to the input. We introduce a new approach to solve this problem based on the graph theoretic model and compare its performance to previously known algorithms. We also show that an approach using deterministic finite automata cannot achieve similarly efficient algorithms. Furthermore, we describe a fatal flaw in some of the previously published algorithms based on the same model. Finally, we provide experimental evaluation of our algorithm on real-world data.

Codes & Other

All related code materials: codes.tar.gz

The code was tested on these corpuses:

Implementation in c++

In practice, the data types required may be way bigger than 4*word-size so next we provide a GSM implementation in cpp which use bitsets.

Presentation

The paper was presented at MACIS 2017.