Question of the Week (Cassidoo).
#1: Bitap Algorithm
Fuzzy string search
Fuzzy string search with Bitap: from naive to efficient
A big shoutout to
Cassidy Williams
(and her incredible
Cassidoo newsletter
)
for this week’s interview brain-teaser. The challenge? Implement
fuzzy string search using the Bitap algorithm —
and find all matches with up to k errors. This blog
post walks through the problem, the math, and the code — from the
brute force mindset to the elegant bit-parallel solution that
makes Bitap a legend.
If you're here to level up your string algorithms game, buckle up. We’ll write Python you can trust — readable, testable, and tied to the same Levenshtein distance model that production bit-parallel matchers are built to honor — and along the way, we’ll make sure you actually understand why Bitap works.
The problem statement
“Given a text string and a pattern, implement a fuzzy string search using the Bitap algorithm that finds all positions in the text where the pattern matches with at most k errors (insertions, deletions, or substitutions). Return an array of objects containing the position and the number of errors at that match.”
Example:
> fuzzySearch("the cat sat on the mat", "cat", 0);
> [{ position: 4, errors: 0 }]
> fuzzySearch("cassidoo", "cool", 1);
> []
> fuzzySearch("cassidoo", "cool", 3);
> [{ "position": 0, "errors": 3 }, { "position": 4, "errors": 3 }]
(Note: With k = 3, both matches align
"cool" against the length-4 slice of text starting at
that index using three edits. Levenshtein allows substitutions and
gaps; a match is not the same as finding "cool"
as a literal unchanged substring.)
At first glance, this is like a “search with typos allowed”. But
instead of returning just a boolean, we need exact match positions
along with the error count for each match. The
Bitap algorithm (also known as shift-or or
Baeza-Yates–Gonnet) is famously used in agrep
(approximate grep) and is a perfect fit.
Restating in plain English
Imagine you're building a “did you mean?” feature. The user types
"cassidoo" and looks for "cool". With
k=3, you allow up to three edits (Levenshtein style).
A production Bitap engine efficiently decides, while streaming the
text, whether the pattern aligns within ≤ k errors.
Carrying the exact error count in lockstep may require extra
bitmask lanes or a tie-breaking pass; our interview shape asks for
that count explicitly, so the clearest reference is still the DP slice
below — while the masks handle the yes/no “within budget?” question at
full text speed.
Our job:
- Scan the text character by character.
- Maintain state vectors (bitmasks) representing how well the pattern matches the current suffix.
-
For each window start
i, record{position, errors}when the Levenshtein distance betweenpatternandtext[i : i+m]is ≤k.
Indexing convention. The prompt uses the
start index i of a fixed-length slice
text[i : i+m]. Many streaming Bitap write-ups report the
end of a hit instead; they differ by
m - 1 once you know the pattern length.
From naive to bit magic: a journey
1. The brute-force baseline (Levenshtein distance at every window)
The most straightforward approach: slide a window of size
len(pattern) over the text, compute Levenshtein
distance between the pattern and each window, and keep those where
distance ≤ k. Classic dynamic-programming Levenshtein on two
length-m slices costs O(m2)
time per window, so sliding the window yields
O(n · m2) overall. You can cut
space to O(m) with two rows, but each window still
needs Θ(m2) time unless you add extra structure (for
example a band when k is small). Either way, long
texts hurt.
Let’s sketch it: for each index i in text,
compare pattern with text[i:i+len(pattern)]
using dynamic programming. This is simple but inefficient: for each
position you recompute the whole edit distance matrix.
def naive_fuzzy(text, pattern, k):
results = []
m = len(pattern)
for i in range(len(text) - m + 1):
dist = levenshtein(pattern, text[i:i+m])
if dist <= k:
results.append({"position": i, "errors": dist})
return results
def levenshtein(a, b):
# classic DP O(m*n) but here m is fixed
dp = [[0]*(len(b)+1) for _ in range(len(a)+1)]
for i in range(len(a)+1):
dp[i][0] = i
for j in range(len(b)+1):
dp[0][j] = j
for i in range(1, len(a)+1):
for j in range(1, len(b)+1):
cost = 0 if a[i-1] == b[j-1] else 1
dp[i][j] = min(dp[i-1][j]+1, dp[i][j-1]+1, dp[i-1][j-1]+cost)
return dp[len(a)][len(b)]
This gets the job done, but each comparison costs O(m2) and we repeat it up to n times — far from ideal for large documents.
2. Enter Bitap — the shift-or algorithm (exact match)
Before tackling fuzzy search, let's understand exact matching with Bitap. The idea: track which prefixes of the pattern can still line up with the recent suffix of the text using a bit-vector R in a machine word — the NFA simulation runs in parallel along those bits.
Mask convention (shift-OR). For each text character
c, precompute pattern_mask[c] with a
0 at bit positions where the pattern has character
c, and 1 elsewhere (mismatch). For
each new c from the text:
R = ((R << 1) | 1) & pattern_mask[c]
k = 0 listing below uses the
equivalent OR-then-shift update from classic C
sources (R |= pattern_mask[c]; R <<= 1), not the
shift-then-AND line shown above — same automaton, different bookkeeping
around the shift.
Word size. You need m bits for a
length-m pattern (plus bookkeeping). If
m exceeds the hardware word size w (often
64), the vector is split across several words and each step costs on
the order of ⌈m / w⌉ word updates instead of one —
fuzzy variants that keep several masks per error tier scale the same way.
For fuzzy search we need to count errors — that’s where state grows beyond a single R.
3. Bitap for fuzzy search (with ≤ k errors)
The fuzzy idea is still the same family as shift-or: maintain a small stack of bitmasks — one per error budget — so a stream of text characters updates every edit distance “frontier” in lockstep. Full Levenshtein (insertions, deletions, and substitutions) needs the Manber–Wu / Baeza-Yates–Navarro style extensions; the exact bitwise layouts differ from the exact matcher because delete and insert are no longer a single “xor with the mask” story.
Extending this to the full Levenshtein model is not a small tweak: you must combine several bitmasks per error level with a precise recurrence — easy to get wrong, which is why serious writeups rely on those published layouts instead of a half-page sketch.
Rather than paste a brittle skeleton that is easy to get wrong
(bit polarity and off-by-one semantics are ruthless), the listing
below implements the same Levenshtein model as the
brute-force baseline: for every start index i, align
pattern with the length-m slice
text[i : i+m], score edits, and keep results with
distance ≤ k. It is the reference truth your readers
can unit-test against; a production agrep-style engine
would replace the inner loop with the bit-parallel recurrence.
A correct Python implementation (same model as the baseline)
This returns {position, errors} objects in the shape
of the prompt. It is intentionally plain: correctness and clarity
first; Bitap is the fast path that implements the same distance
scoring as this baseline.
def levenshtein(a, b):
"""Classic dynamic-programming edit distance."""
dp = [[0] * (len(b) + 1) for _ in range(len(a) + 1)]
for i in range(len(a) + 1):
dp[i][0] = i
for j in range(len(b) + 1):
dp[0][j] = j
for i in range(1, len(a) + 1):
for j in range(1, len(b) + 1):
cost = 0 if a[i - 1] == b[j - 1] else 1
dp[i][j] = min(
dp[i - 1][j] + 1,
dp[i][j - 1] + 1,
dp[i - 1][j - 1] + cost,
)
return dp[len(a)][len(b)]
def fuzzy_search(text, pattern, k):
m = len(pattern)
if m == 0:
return []
results = []
for i in range(len(text) - m + 1):
dist = levenshtein(pattern, text[i : i + m])
if dist <= k:
results.append({"position": i, "errors": dist})
return results
if __name__ == "__main__":
print(fuzzy_search("the cat sat on the mat", "cat", 0))
# [{'position': 4, 'errors': 0}]
print(fuzzy_search("cassidoo", "cool", 1))
# []
print(fuzzy_search("cassidoo", "cool", 3))
# [{'position': 0, 'errors': 3}, {'position': 4, 'errors': 3}]
Precise Bitap (k = 0) — minimal shift-or in Python
This is a compact, standard shift-or implementation:
character masks use cleared bits (0) where the pattern
matches, and the machine word R is updated with
| then << 1 on each text character.
It finds every start index where an exact length-m
slice equals pattern. Requires m + 1 bits
of headroom (pattern lengths in the dozens are typical on real CPUs).
def bitap_exact_positions(text, pattern):
"""Shift-or exact match: all start indices i with text[i : i + m] == pattern."""
m = len(pattern)
if m == 0:
return []
# (m + 1) bits; bit m is the sentinel (BYG / Wikipedia layout).
mask_all = (1 << (m + 1)) - 1
alphabet = set(text) | set(pattern)
pattern_mask = {c: mask_all for c in alphabet}
for j in range(m):
pattern_mask[pattern[j]] &= ~(1 << j)
R = mask_all ^ 1 # low bit 0, bits 1..m set — same role as ~1 in fixed-width C
out = []
for idx, ch in enumerate(text):
R |= pattern_mask.get(ch, mask_all)
R = (R << 1) & mask_all
if (R & (1 << m)) == 0:
start = idx - m + 1
if start >= 0:
out.append(start)
return out
if __name__ == "__main__":
sample = "the cat sat on the mat"
assert bitap_exact_positions(sample, "cat") == [
hit["position"] for hit in fuzzy_search(sample, "cat", 0)
]
When k = 1 on length-m windows
Full Levenshtein fuzzy Bitap (k > 0) needs the
Wu–Manber / Baeza-Yates–Navarro recurrence on several masks. For
the interview window model — two strings of equal
length m — you can still certify
distance ≤ 1 in O(m) time with a
tiny three-way check (try one substitution, one delete from the left,
one delete from the right). It agrees with levenshtein
on whether the distance is 0 or 1; if it reports “above 1”, call the
full DP only when you need the exact larger value.
def levenshtein_leq1_equal_len(a, b):
"""
If len(a) == len(b) and Levenshtein(a, b) <= 1, return that distance (0 or 1).
Otherwise return None. O(m) time; easy to audit and fuzz-test.
"""
if len(a) != len(b):
raise ValueError("expected equal-length slices for this fast path")
n = len(a)
i = 0
while i < n and a[i] == b[i]:
i += 1
if i == n:
return 0
if a[i + 1 :] == b[i + 1 :]:
return 1
if a[i + 1 :] == b[i:]:
return 1
if a[i:] == b[i + 1 :]:
return 1
return None
def fuzzy_search_k1_equal_windows(text, pattern):
m = len(pattern)
if m == 0:
return []
results = []
for i in range(len(text) - m + 1):
d = levenshtein_leq1_equal_len(pattern, text[i : i + m])
if d is not None:
results.append({"position": i, "errors": d})
return results
if __name__ == "__main__":
assert fuzzy_search_k1_equal_windows("cassidoo", "cool") == fuzzy_search(
"cassidoo", "cool", 1
)
Drop these snippets next to a full fuzzy bitmask engine in your editor and fuzz on random strings — if the scores diverge, the bug is always in the parallel layer you added last.
Complexity & real-world usage
The reference implementation above costs Θ((n −
m + 1) · m2) time in
the worst case — exactly the kind of redundancy Bitap-class algorithms
eliminate. A careful fuzzy Bitap runs in
O(n · k · ⌈m / w⌉)
word operations for text length n, edit budget
k, pattern length m, and machine word
w — i.e. linear in n with
multiplicative factors in k and ⌈m/w⌉,
instead of Θ(m2) DP work per sliding window.
Production fuzzy matchers preserve the same Levenshtein semantics but
drive the cost down with those bit-parallel updates.
Where is Bitap used? The agrep Unix command, grep’s
fuzzy sibling, uses it under the hood. Also, some code editors and
search tools rely on similar algorithms for typo-tolerant search.
Wrapping up
We started with brute force, zoomed in on how shift-or packs NFA
states into integers, added a working exact Bitap
plus a tight k = 1 window check, and
kept the full DP slice as the interview’s verifiable
scoring model. Pair the intuition with code you can fuzz-test; once
the baseline is trusted, any bit-parallel shortcut has to match it
bit-for-bit on the distances. Next time you see a “Did you mean …?”
feature, you’ll know one way it might be implemented.
Happy coding, and may your searches always be fuzzy in the right way!
Question of the Week #1 — Cassidoo.
References & further reading
- Ricardo Baeza-Yates and Gaston H. Gonnet — "A New Approach to Text Searching" — original Communications of the ACM paper that introduced the Bitap / shift-or line of work.
- Bitap algorithm (Wikipedia) — compact overview of exact and fuzzy variants, with the usual bit-mask intuition.
- Gene Myers — “A fast bit-vector algorithm for approximate string matching based on dynamic programming” — classic bit-vector treatment; useful context next to shift-or Bitap.
Can't get enough from Lumænaut_? Read this...
- Grinding LeetCode — Day 1: Two Sum — a beginner-friendly walkthrough from brute force to an O(n) hash map solution.
- The 8 Algorithm Paradigms — eight mental models you’ll keep reusing across problems, interviews, and production code.
- Everything I Don't Know About Blogging — the behind-the-scenes logbook: domains, SEO, and learning in public.