QOTW (Cassidoo) #1: Fuzzy string search with Bitap

Fuzzy string search with Bitap: from naive to efficient

A big shoutout to Cassidy Williams (and her incredible Cassidoo newsletter ) for this week’s interview brain-teaser. The challenge? Implement fuzzy string search using the Bitap algorithm — and find all matches with up to k errors. This blog post walks through the problem, the math, and the code — from the brute force mindset to the elegant bit-parallel solution that makes Bitap a legend.

If you're here to level up your string algorithms game, buckle up. We’ll write Python you can trust — readable, testable, and tied to the same Levenshtein distance model that production bit-parallel matchers are built to honor — and along the way, we’ll make sure you actually understand why Bitap works.

The problem statement

“Given a text string and a pattern, implement a fuzzy string search using the Bitap algorithm that finds all positions in the text where the pattern matches with at most k errors (insertions, deletions, or substitutions). Return an array of objects containing the position and the number of errors at that match.”

Example:


> fuzzySearch("the cat sat on the mat", "cat", 0);
> [{ position: 4, errors: 0 }]

> fuzzySearch("cassidoo", "cool", 1);
> []

> fuzzySearch("cassidoo", "cool", 3);
> [{ "position": 0, "errors": 3 }, { "position": 4, "errors": 3 }]

(Note: With k = 3, both matches align "cool" against the length-4 slice of text starting at that index using three edits. Levenshtein allows substitutions and gaps; a match is not the same as finding "cool" as a literal unchanged substring.)

At first glance, this is like a “search with typos allowed”. But instead of returning just a boolean, we need exact match positions along with the error count for each match. The Bitap algorithm (also known as shift-or or Baeza-Yates–Gonnet) is famously used in agrep (approximate grep) and is a perfect fit.

Wait, why Bitap? Because it packs a lot of work into bitwise operations — essentially simulating a nondeterministic finite automaton (NFA) in parallel. For exact matching (zero errors), classic shift-or Bitap does O(n) work over the text when the pattern fits in a single machine word. For fuzzy matching with up to k errors, bit-parallel Bitap-style methods typically run in O(n · k) time for a pattern that fits in one word, or O(n · k · ⌈m / w⌉) when the bit-vector is split across words of size w — still with very small constants compared to recomputing full dynamic programming at every step. Other approaches (for example Myers’s or Ukkonen’s) offer different tradeoffs; Bitap is fast and elegant, not the only asymptotic game in town.

Restating in plain English

Imagine you're building a “did you mean?” feature. The user types "cassidoo" and looks for "cool". With k=3, you allow up to three edits (Levenshtein style). A production Bitap engine efficiently decides, while streaming the text, whether the pattern aligns within ≤ k errors. Carrying the exact error count in lockstep may require extra bitmask lanes or a tie-breaking pass; our interview shape asks for that count explicitly, so the clearest reference is still the DP slice below — while the masks handle the yes/no “within budget?” question at full text speed.

Our job:

Scan the text character by character.
Maintain state vectors (bitmasks) representing how well the pattern matches the current suffix.
For each window start i, record {position, errors} when the Levenshtein distance between pattern and text[i : i+m] is ≤ k.

Indexing convention. The prompt uses the start index i of a fixed-length slice text[i : i+m]. Many streaming Bitap write-ups report the end of a hit instead; they differ by m - 1 once you know the pattern length.

From naive to bit magic: a journey

1. The brute-force baseline (Levenshtein distance at every window)

The most straightforward approach: slide a window of size len(pattern) over the text, compute Levenshtein distance between the pattern and each window, and keep those where distance ≤ k. Classic dynamic-programming Levenshtein on two length-m slices costs O(m²) time per window, so sliding the window yields O(n · m²) overall. You can cut space to O(m) with two rows, but each window still needs Θ(m²) time unless you add extra structure (for example a band when k is small). Either way, long texts hurt.

Let’s sketch it: for each index i in text, compare pattern with text[i:i+len(pattern)] using dynamic programming. This is simple but inefficient: for each position you recompute the whole edit distance matrix.


def naive_fuzzy(text, pattern, k):
    results = []
    m = len(pattern)
    for i in range(len(text) - m + 1):
        dist = levenshtein(pattern, text[i:i+m])
        if dist <= k:
            results.append({"position": i, "errors": dist})
    return results

def levenshtein(a, b):
    # classic DP O(m*n) but here m is fixed
    dp = [[0]*(len(b)+1) for _ in range(len(a)+1)]
    for i in range(len(a)+1):
        dp[i][0] = i
    for j in range(len(b)+1):
        dp[0][j] = j
    for i in range(1, len(a)+1):
        for j in range(1, len(b)+1):
            cost = 0 if a[i-1] == b[j-1] else 1
            dp[i][j] = min(dp[i-1][j]+1, dp[i][j-1]+1, dp[i-1][j-1]+cost)
    return dp[len(a)][len(b)]

This gets the job done, but each comparison costs O(m²) and we repeat it up to n times — far from ideal for large documents.

2. Enter Bitap — the shift-or algorithm (exact match)

Before tackling fuzzy search, let's understand exact matching with Bitap. The idea: track which prefixes of the pattern can still line up with the recent suffix of the text using a bit-vector R in a machine word — the NFA simulation runs in parallel along those bits.

Mask convention (shift-OR). For each text character c, precompute pattern_mask[c] with a 0 at bit positions where the pattern has character c, and 1 elsewhere (mismatch). For each new c from the text:

R = ((R << 1) | 1) & pattern_mask[c]

The shift advances candidate prefixes; the AND clears bits wherever the new character disagrees with the pattern. Some textbooks store inverted masks or describe an equivalent shift-AND update — same underlying automaton, different bit polarity. When you implement from a paper, keep one convention end-to-end. The exact k = 0 listing below uses the equivalent OR-then-shift update from classic C sources (R |= pattern_mask[c]; R <<= 1), not the shift-then-AND line shown above — same automaton, different bookkeeping around the shift.

Word size. You need m bits for a length-m pattern (plus bookkeeping). If m exceeds the hardware word size w (often 64), the vector is split across several words and each step costs on the order of ⌈m / w⌉ word updates instead of one — fuzzy variants that keep several masks per error tier scale the same way.

For fuzzy search we need to count errors — that’s where state grows beyond a single R.

3. Bitap for fuzzy search (with ≤ k errors)

The fuzzy idea is still the same family as shift-or: maintain a small stack of bitmasks — one per error budget — so a stream of text characters updates every edit distance “frontier” in lockstep. Full Levenshtein (insertions, deletions, and substitutions) needs the Manber–Wu / Baeza-Yates–Navarro style extensions; the exact bitwise layouts differ from the exact matcher because delete and insert are no longer a single “xor with the mask” story.

Extending this to the full Levenshtein model is not a small tweak: you must combine several bitmasks per error level with a precise recurrence — easy to get wrong, which is why serious writeups rely on those published layouts instead of a half-page sketch.

Rather than paste a brittle skeleton that is easy to get wrong (bit polarity and off-by-one semantics are ruthless), the listing below implements the same Levenshtein model as the brute-force baseline: for every start index i, align pattern with the length-m slice text[i : i+m], score edits, and keep results with distance ≤ k. It is the reference truth your readers can unit-test against; a production agrep-style engine would replace the inner loop with the bit-parallel recurrence.

A correct Python implementation (same model as the baseline)

This returns {position, errors} objects in the shape of the prompt. It is intentionally plain: correctness and clarity first; Bitap is the fast path that implements the same distance scoring as this baseline.


def levenshtein(a, b):
    """Classic dynamic-programming edit distance."""
    dp = [[0] * (len(b) + 1) for _ in range(len(a) + 1)]
    for i in range(len(a) + 1):
        dp[i][0] = i
    for j in range(len(b) + 1):
        dp[0][j] = j
    for i in range(1, len(a) + 1):
        for j in range(1, len(b) + 1):
            cost = 0 if a[i - 1] == b[j - 1] else 1
            dp[i][j] = min(
                dp[i - 1][j] + 1,
                dp[i][j - 1] + 1,
                dp[i - 1][j - 1] + cost,
            )
    return dp[len(a)][len(b)]


def fuzzy_search(text, pattern, k):
    m = len(pattern)
    if m == 0:
        return []
    results = []
    for i in range(len(text) - m + 1):
        dist = levenshtein(pattern, text[i : i + m])
        if dist <= k:
            results.append({"position": i, "errors": dist})
    return results


if __name__ == "__main__":
    print(fuzzy_search("the cat sat on the mat", "cat", 0))
    # [{'position': 4, 'errors': 0}]

    print(fuzzy_search("cassidoo", "cool", 1))
    # []

    print(fuzzy_search("cassidoo", "cool", 3))
    # [{'position': 0, 'errors': 3}, {'position': 4, 'errors': 3}]

Precise Bitap (`k = 0`) — minimal shift-or in Python

This is a compact, standard shift-or implementation: character masks use cleared bits (0) where the pattern matches, and the machine word R is updated with | then << 1 on each text character. It finds every start index where an exact length-m slice equals pattern. Requires m + 1 bits of headroom (pattern lengths in the dozens are typical on real CPUs).


def bitap_exact_positions(text, pattern):
    """Shift-or exact match: all start indices i with text[i : i + m] == pattern."""
    m = len(pattern)
    if m == 0:
        return []
    # (m + 1) bits; bit m is the sentinel (BYG / Wikipedia layout).
    mask_all = (1 << (m + 1)) - 1
    alphabet = set(text) | set(pattern)
    pattern_mask = {c: mask_all for c in alphabet}
    for j in range(m):
        pattern_mask[pattern[j]] &= ~(1 << j)

    R = mask_all ^ 1  # low bit 0, bits 1..m set — same role as ~1 in fixed-width C
    out = []
    for idx, ch in enumerate(text):
        R |= pattern_mask.get(ch, mask_all)
        R = (R << 1) & mask_all
        if (R & (1 << m)) == 0:
            start = idx - m + 1
            if start >= 0:
                out.append(start)
    return out


if __name__ == "__main__":
    sample = "the cat sat on the mat"
    assert bitap_exact_positions(sample, "cat") == [
        hit["position"] for hit in fuzzy_search(sample, "cat", 0)
    ]

When `k = 1` on length-`m` windows

Full Levenshtein fuzzy Bitap (k > 0) needs the Wu–Manber / Baeza-Yates–Navarro recurrence on several masks. For the interview window model — two strings of equal length m — you can still certify distance ≤ 1 in O(m) time with a tiny three-way check (try one substitution, one delete from the left, one delete from the right). It agrees with levenshtein on whether the distance is 0 or 1; if it reports “above 1”, call the full DP only when you need the exact larger value.


def levenshtein_leq1_equal_len(a, b):
    """
    If len(a) == len(b) and Levenshtein(a, b) <= 1, return that distance (0 or 1).
    Otherwise return None. O(m) time; easy to audit and fuzz-test.
    """
    if len(a) != len(b):
        raise ValueError("expected equal-length slices for this fast path")
    n = len(a)
    i = 0
    while i < n and a[i] == b[i]:
        i += 1
    if i == n:
        return 0
    if a[i + 1 :] == b[i + 1 :]:
        return 1
    if a[i + 1 :] == b[i:]:
        return 1
    if a[i:] == b[i + 1 :]:
        return 1
    return None


def fuzzy_search_k1_equal_windows(text, pattern):
    m = len(pattern)
    if m == 0:
        return []
    results = []
    for i in range(len(text) - m + 1):
        d = levenshtein_leq1_equal_len(pattern, text[i : i + m])
        if d is not None:
            results.append({"position": i, "errors": d})
    return results


if __name__ == "__main__":
    assert fuzzy_search_k1_equal_windows("cassidoo", "cool") == fuzzy_search(
        "cassidoo", "cool", 1
    )

Drop these snippets next to a full fuzzy bitmask engine in your editor and fuzz on random strings — if the scores diverge, the bug is always in the parallel layer you added last.

Complexity & real-world usage

The reference implementation above costs Θ((n − m + 1) · m²) time in the worst case — exactly the kind of redundancy Bitap-class algorithms eliminate. A careful fuzzy Bitap runs in O(n · k · ⌈m / w⌉) word operations for text length n, edit budget k, pattern length m, and machine word w — i.e. linear in n with multiplicative factors in k and ⌈m/w⌉, instead of Θ(m²) DP work per sliding window. Production fuzzy matchers preserve the same Levenshtein semantics but drive the cost down with those bit-parallel updates.

Where is Bitap used? The agrep Unix command, grep’s fuzzy sibling, uses it under the hood. Also, some code editors and search tools rely on similar algorithms for typo-tolerant search.

Wrapping up

We started with brute force, zoomed in on how shift-or packs NFA states into integers, added a working exact Bitap plus a tight k = 1 window check, and kept the full DP slice as the interview’s verifiable scoring model. Pair the intuition with code you can fuzz-test; once the baseline is trusted, any bit-parallel shortcut has to match it bit-for-bit on the distances. Next time you see a “Did you mean …?” feature, you’ll know one way it might be implemented.

Happy coding, and may your searches always be fuzzy in the right way!

References & further reading

Ricardo Baeza-Yates and Gaston H. Gonnet — "A New Approach to Text Searching" — original Communications of the ACM paper that introduced the Bitap / shift-or line of work.
Bitap algorithm (Wikipedia) — compact overview of exact and fuzzy variants, with the usual bit-mask intuition.
Gene Myers — “A fast bit-vector algorithm for approximate string matching based on dynamic programming” — classic bit-vector treatment; useful context next to shift-or Bitap.

Can't get enough from Lumænaut_? Read this...

Grinding LeetCode — Day 1: Two Sum — a beginner-friendly walkthrough from brute force to an O(n) hash map solution.
The 8 Algorithm Paradigms — eight mental models you’ll keep reusing across problems, interviews, and production code.
Everything I Don't Know About Blogging — the behind-the-scenes logbook: domains, SEO, and learning in public.