### On pattern matching with k mismatches and few don't cares

**2016-02-01**

1602.00621 | cs.DS

We consider the problem of pattern matching with $k$ mismatches, where there
can be don't care or wild card characters in the pattern. Specifically, given a
pattern $P$ of length $m$ and a text $T$ of length $n$, we want to find all
occurrences of $P$ in $T$ that have no more than $k$ mismatches. The pattern
can have don't care characters, which match any character. Without don't cares,
the best known algorithm for pattern matching with $k$ mismatches has a runtime
of $O(n\sqrt{k \log k})$. With don't cares in the pattern, the best
deterministic algorithm has a runtime of $O(nk polylog m)$. Therefore, there is
an important gap between the versions with and without don't cares.
In this paper we give an algorithm whose runtime increases with the number of
don't cares. We define an {\em island} to be a maximal length substring of $P$
that does not contain don't cares. Let $q$ be the number of islands in $P$. We
present an algorithm that runs in $O(n\sqrt{k\log m}+n\min\{\sqrt[3]{qk\log^2
m},\sqrt{q\log m}\})$ time. If the number of islands $q$ is $O(k)$ this runtime
becomes $O(n\sqrt{k\log m})$, which essentially matches the best known runtime
for pattern matching with $k$ mismatches without don't cares. If the number of
islands $q$ is $O(k^2)$, this algorithm is asymptotically faster than the
previous best algorithm for pattern matching with $k$ mismatches with don't
cares in the pattern.

**Login to like/save this paper, take notes and configure your recommendations**

# Related Articles

**2018-02-16**

1802.05906 | cs.DS

Gagie, Navarro and Prezza's $r$-index (SODA, 2018) promises to speed up DNA
alignment and variation … show more

**2019-02-14**

1902.05224 | cs.DS

Converting a compressed format of a string into another compressed format
without an explicit decomp… show more

**2019-05-07**

1905.02589 | cs.DS

Given an indeterminate string pattern $p$ and an indeterminate string text
$t$, the problem of order… show more

**2019-05-07**

1905.02298 | cs.DS

An elastic-degenerate (ED) string is a sequence of $n$ sets of strings of
total length $N$, which wa… show more

**2019-01-25**

1901.08729 | cs.CR

With the advent of large-scale heterogeneous search engines comes the problem
of unified search cont… show more

**2017-10-10**

1710.03395 | cs.DS

The dictionary matching is a task to find all occurrences of patterns in a
set $D$ (called a diction… show more

**2018-12-02**

1812.00421 | cs.DS

Unbalanced translocations are among the most frequent chromosomal
alterations, accounted for 30\% of… show more

**2018-12-02**

1812.00359 | cs.DS

We consider two closely related problems of text indexing in a sub-linear
working space. The first p… show more

**2016-07-19**

1607.05626 | cs.DS

We present a new streaming algorithm for the $k$-Mismatch problem, one of the
most basic problems in… show more

**2017-12-22**

1712.08573 | cs.DS

In the longest common substring problem, we are given two strings of length
$n$ and must find a subs… show more

**2018-10-08**

1810.03551 | cs.DS

We consider the approximate pattern matching problem under edit distance. In
this problem we are giv… show more

**2016-06-15**

1606.04763 | cs.DS

The pattern matching problem with swaps is to find all occurrences of a
pattern in a text while allo… show more

**2018-10-03**

1810.01676 | cs.DS

Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the
approximate pattern matching pro… show more

**2018-09-07**

1809.02517 | cs.DS

In the $k$-mismatch problem we are given a pattern of length $m$ and a text
and must find all locati… show more

**2018-11-10**

1811.04300 | cs.DS

We present an algorithm for approximating the edit distance
$\operatorname{ed}(x, y)$ between two st… show more

**2014-01-29**

1401.7416 | cs.DS

String matching algorithm plays the vital role in the Computational Biology.
The functional and stru… show more