Skip to content

Commit ca2a56b

Browse files
committed
docs: README — the emitted parser need not be JS (issue #6)
Documents the target-agnostic emitter under "A language-agnostic engine": one analysis → one IR → per-target render (Go/Rust/native, each with its own regex-free lexer), proven by the real javascript.ts and typescript.ts grammars emitting to ts/go/rust byte-identical to the interpreter and gate-maintained, with the Rust/Go throughput results and the ASCII-offset boundary noted.
1 parent cd4ebc8 commit ca2a56b

1 file changed

Lines changed: 15 additions & 0 deletions

File tree

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,21 @@ const Regex = token(seq(
338338

339339
[`test/agnostic.ts`](test/agnostic.ts) proves it directly — the same engine parses a toy grammar whose identifier token is `Word`, with no templates or regex. The deeper proof is [`html.ts`](html.ts): markup shares *nothing* with TypeScript's token stream, yet the same engine handles it.
340340

341+
### The emitted parser need not be JS — Go, Rust, native
342+
343+
The grammar also derives a **standalone parser in another language**. [`emitPortableParser(grammar, target)`](src/emit-portable.ts) runs one analysis into one language-agnostic IR, and each `Target` renders it — including its own regex-free lexer, so the output has no dependency on the JS runtime and compiles offline:
344+
345+
```ts
346+
import { emitPortableParser } from './src/emit-portable.ts';
347+
import { goTarget } from './src/target-go.ts';
348+
import { rustTarget } from './src/target-rust.ts';
349+
350+
writeFileSync('parser.go', emitPortableParser(grammar, goTarget)); // `go build`, no deps
351+
writeFileSync('parser.rs', emitPortableParser(grammar, rustTarget)); // `rustc`, no crates
352+
```
353+
354+
The proof is the full languages: the real [`javascript.ts`](javascript.ts) and [`typescript.ts`](typescript.ts) grammars — including the `[Await]/[Yield]` fork, left recursion, the regex/division and template state machines, arrow functions, and the TS type grammar — emit to **TypeScript, Go, and Rust**, and every CST is byte-identical to the reference interpreter. [`test/portable-targets.ts`](test/portable-targets.ts) compiles and runs all three for sixteen grammars (the two real languages plus focused fixtures) on every CI run. The Rust output reaches [oxc](https://github.com/oxc-project/oxc) throughput and the Go output beats [tsgo](https://github.com/microsoft/typescript-go) on the same corpus (an arena keeps both near zero-allocation). Byte-based Go/Rust use UTF-8 offsets — identical to the JS interpreter's for ASCII; non-ASCII offset units differ inherently.
355+
341356
## Adding a language
342357

343358
A new language is **one grammar file** on the unchanged engine:

0 commit comments

Comments
 (0)