ML p(r)ior | On pattern matching with k mismatches and few don't cares

On pattern matching with k mismatches and few don't cares

2016-02-01
1602.00621 | cs.DS
We consider the problem of pattern matching with $k$ mismatches, where there can be don't care or wild card characters in the pattern. Specifically, given a pattern $P$ of length $m$ and a text $T$ of length $n$, we want to find all occurrences of $P$ in $T$ that have no more than $k$ mismatches. The pattern can have don't care characters, which match any character. Without don't cares, the best known algorithm for pattern matching with $k$ mismatches has a runtime of $O(n\sqrt{k \log k})$. With don't cares in the pattern, the best deterministic algorithm has a runtime of $O(nk polylog m)$. Therefore, there is an important gap between the versions with and without don't cares. In this paper we give an algorithm whose runtime increases with the number of don't cares. We define an {\em island} to be a maximal length substring of $P$ that does not contain don't cares. Let $q$ be the number of islands in $P$. We present an algorithm that runs in $O(n\sqrt{k\log m}+n\min\{\sqrt[3]{qk\log^2 m},\sqrt{q\log m}\})$ time. If the number of islands $q$ is $O(k)$ this runtime becomes $O(n\sqrt{k\log m})$, which essentially matches the best known runtime for pattern matching with $k$ mismatches without don't cares. If the number of islands $q$ is $O(k^2)$, this algorithm is asymptotically faster than the previous best algorithm for pattern matching with $k$ mismatches with don't cares in the pattern.

Highlights - Most important sentences from the article

Related Articles

2018-02-16
1802.05906 | cs.DS

Gagie, Navarro and Prezza's $r$-index (SODA, 2018) promises to speed up DNA alignment and variation … show more

Highlights - Most important sentences from the article

2019-02-14
1902.05224 | cs.DS

Converting a compressed format of a string into another compressed format without an explicit decomp… show more

Highlights - Most important sentences from the article

2019-05-07
1905.02589 | cs.DS

Given an indeterminate string pattern $p$ and an indeterminate string text $t$, the problem of order… show more

Highlights - Most important sentences from the article

2019-05-07
1905.02298 | cs.DS

An elastic-degenerate (ED) string is a sequence of $n$ sets of strings of total length $N$, which wa… show more

Highlights - Most important sentences from the article

2019-01-25
1901.08729 | cs.CR

With the advent of large-scale heterogeneous search engines comes the problem of unified search cont… show more

Highlights - Most important sentences from the article

2017-10-10
1710.03395 | cs.DS

The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a diction… show more

Highlights - Most important sentences from the article

2018-12-02
1812.00421 | cs.DS

Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30\% of… show more

Highlights - Most important sentences from the article

2018-12-02
1812.00359 | cs.DS

We consider two closely related problems of text indexing in a sub-linear working space. The first p… show more

Highlights - Most important sentences from the article

2016-07-19
1607.05626 | cs.DS

We present a new streaming algorithm for the $k$-Mismatch problem, one of the most basic problems in… show more

Highlights - Most important sentences from the article

2017-12-22
1712.08573 | cs.DS

In the longest common substring problem, we are given two strings of length $n$ and must find a subs… show more

Highlights - Most important sentences from the article

2018-10-08
1810.03551 | cs.DS

We consider the approximate pattern matching problem under edit distance. In this problem we are giv… show more

Highlights - Most important sentences from the article

2016-06-15
1606.04763 | cs.DS

The pattern matching problem with swaps is to find all occurrences of a pattern in a text while allo… show more

Highlights - Most important sentences from the article

2018-10-03
1810.01676 | cs.DS

Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching pro… show more

Highlights - Most important sentences from the article

2018-09-07
1809.02517 | cs.DS

In the $k$-mismatch problem we are given a pattern of length $m$ and a text and must find all locati… show more

Highlights - Most important sentences from the article

2018-11-10
1811.04300 | cs.DS

We present an algorithm for approximating the edit distance $\operatorname{ed}(x, y)$ between two st… show more

Highlights - Most important sentences from the article

2014-01-29
1401.7416 | cs.DS

String matching algorithm plays the vital role in the Computational Biology. The functional and stru… show more

Highlights - Most important sentences from the article