perf(codegen): Use char codes for simple classes by scttcper · Pull Request #660 · peggyjs/peggy

scttcper · 2026-05-14T14:46:25Z

This optimizes generated parser code for simple character classes.

A lot of grammars have hot loops that look like this:

identifier = [a-zA-Z_$] [a-zA-Z0-9_$]*
number = [0-9]+ ("." [0-9]+)?
whitespace = [ \t\n\r]*

Peggy currently emits a regexp test for each character consumed by those classes. This PR emits direct charCodeAt comparisons when the class is simple enough to do that exactly: no i flag, no unicode mode, and only single-code-unit ranges/characters.

Anything more complicated still uses the existing regexp path.

As one real-world benchmark, I used Sentry’s search grammar, which is the grammar we use for parsing search bar queries: https://github.com/getsentry/sentry/blob/master/static/app/components/searchSyntax/grammar.pegjs. That grammar has a lot of key/value filters, so longer searches spend plenty of time in these character-class loops.

Input	Before	After	Speedup
200 short search strings, cycling through common filter/query shapes	3.1664 ms	2.5784 ms	18.6%
one 2.7 KB query with 120 mixed free-text, filter, and grouped terms	0.7804 ms	0.6653 ms	14.7%
two copies of the long query joined with `AND`, about 5.4 KB	1.6284 ms	1.4060 ms	13.7%
the long query plus a parenthesized copy joined with `OR`, about 5.4 KB	1.6405 ms	1.3890 ms	15.3%

For that generated parser, repeated class-check code went from 177 charCodeAt calls to 76 because each check computes the character code once and reuses it.

Simple character classes were emitted as regexp tests for every accepted character. Hot parser loops like keys and identifiers pay for that over and over. Emit direct charCodeAt comparisons for simple non-unicode, case-sensitive classes and reuse a temp char code inside each check. Keeps the regexp path for the complex cases. Co-Authored-By: Codex GPT-5 <noreply@openai.com>

scttcper · 2026-05-14T15:08:34Z

let me know if you need the benchmark or something else

scttcper marked this pull request as ready for review May 14, 2026 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(codegen): Use char codes for simple classes#660

perf(codegen): Use char codes for simple classes#660
scttcper wants to merge 1 commit into
peggyjs:mainfrom
scttcper:scttcper/charcode-classes

scttcper commented May 14, 2026 •

edited

Loading

Uh oh!

scttcper commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scttcper commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scttcper commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

scttcper commented May 14, 2026 •

edited

Loading