Skip to content

Replace regex-based LIKE matching with a linear-time direct matcher#4235

Open
arnaud-lacurie wants to merge 4 commits into
FoundationDB:mainfrom
arnaud-lacurie:fix-like-redos
Open

Replace regex-based LIKE matching with a linear-time direct matcher#4235
arnaud-lacurie wants to merge 4 commits into
FoundationDB:mainfrom
arnaud-lacurie:fix-like-redos

Conversation

@arnaud-lacurie

Copy link
Copy Markdown
Collaborator

Summary

  • PatternForLikeValue was translating SQL LIKE patterns into Java regex strings (%.*, _.), and LikeOperatorValue was compiling them via Pattern.compile() on every evaluated row. Java's NFA engine backtracks super-polynomially on patterns like %a%a%a%, which is a CPU-exhaustion risk for any caller that can submit a LIKE query.
  • Replace the entire regex pipeline with a two-pointer iterative LIKE matcher that runs in O(n·m) with no backtracking. PatternForLikeValue.eval() now emits a normalized LIKE pattern (%/_ as wildcards, \ as internal escape) rather than a regex string. Pattern.compile is removed entirely.
  • All 75 existing LikeOperatorValueTest cases pass unchanged.

PatternForLikeValue previously translated SQL LIKE patterns into Java
regex strings (% → .*, _ → .) and LikeOperatorValue compiled them via
Pattern.compile() per evaluated row. Java's NFA engine backtracks
super-polynomially on patterns like %a%a%a%, making this a CPU-exhaustion
vector for any caller that can submit a LIKE query.

Replace the regex pipeline with a two-pointer iterative LIKE matcher that
runs in O(n·m) time with no backtracking. PatternForLikeValue now emits a
normalized LIKE pattern (% and _ as wildcards, \ as escape) instead of a
regex string. Pattern.compile is gone entirely.
@arnaud-lacurie arnaud-lacurie added the bug fix Change that fixes a bug label May 27, 2026
@arnaud-lacurie arnaud-lacurie marked this pull request as draft May 27, 2026 23:54
When text contained '%' at a position where the pattern also had '%', the
literal equality branch fired before the wildcard branch, causing '%' in
the pattern to be consumed as a literal match. Move the '%' wildcard check
before the literal equality check and add regression test cases.
@arnaud-lacurie arnaud-lacurie marked this pull request as ready for review June 6, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix Change that fixes a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant