You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Phase F slice 8b — type-directed string `++` lowering
The **full fix** the slice-8a guard (#575) stood in for. String `++` now
lowers **correctly and completely** to wasm — including pure
variable-to-variable `a ++ b`, which the syntactic guard could not
reach.
### The channel (type-directed elaboration)
- **`ast.ml`**: new `ExprStringConcat of expr * expr` (never produced by
the parser).
- **`typecheck.ml`**: `synth` records each `++` node it types as String
concat **by physical identity** (`string_concat_sites`);
`elaborate_string_concat` rewrites exactly those nodes to
`ExprStringConcat`. Physical-identity keying is sound because typecheck
and codegen run over the **same** `prog` object (`parse_with_face`'s
lowered prog, shared by resolve/typecheck/codegen); `ExprBinary` carries
no span and same-text `++` occurrences are value-equal, so `==` is the
correct key.
- **`bin/main.ml`**: the wasm path runs `elaborate_string_concat` after
typecheck, before `Opt.fold`. The **interpreter and non-wasm backends
keep the original `prog`** (`ExprBinary _ OpConcat _`), so the oracle is
unchanged and only the wasm backend sees the new node.
### The lowering (`codegen.ml`)
Byte concat — allocate `4 + la + lb`, write the length word, copy a's
then b's bytes — mirroring the list-concat handler but with **1-byte
elements + a single length word** instead of 4-byte i32 elements. That
i32-element copy was exactly the bug: a string's `[len][utf8]` was
copied as i32 elements, so `"ab" ++ "cd"` read byte 2 as the length word
of `"cd"` (= 2) instead of `'c'` (= 99).
### Effect-ordinal parity (`effect_sites.ml`)
`ExprStringConcat` recurses like `ExprBinary` and is **not** counted as
an `ExprApp` call site, so effect-ordinals stay identical between interp
(sees `ExprBinary`) and wasm (sees `ExprStringConcat`) — avoiding a
#555-class desync. An intrinsic-call encoding (`ExprApp
"__string_concat"`) would have shifted the ordinals; the dedicated node
avoids that. `opt.ml` folds sub-expressions; `interp.ml` handles it
defensively.
The **8a guard is retained as a backstop**: any String `++` reaching
codegen un-elaborated still errors loudly rather than emitting garbage.
### Tests / verification
- `tests/codegen/string_concat.{affine,mjs}` — executable wasm parity,
byte-exact via the slice-1 reader: the **`"ab" ++ "cd"` byte-2 = 99
regression** (was 2), the **var-var** case the guard could not catch,
**chained** `a ++ b ++ c`, and **empty** operands (oracle 6513269).
- `test/test_e2e.ml` "E2E String-wall slice 8 guard" gains a
*lowers-after-elaboration* case.
- Full `run_codegen_wasm_tests.sh` green incl. `list_concat` + slices
1-7 + effect tests; string `++` verified correct in if/match/fn/nested
contexts. (`dune runtest` not runnable in-sandbox — no `alcotest`; the
codegen `.mjs` parity goes through the real CLI pipeline.)
### Migration impact
This closes the **string wall**'s last op: every name-dispatched string
builtin (slices 1-7) + concatenation (8) now lower to wasm. The next
compiler half is the **effect wall** (≈111 effect-gated corpus files).
Builds on #575 (guard, merged) and #574 (design, merged).
https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
---
_Generated by [Claude
Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_
Co-authored-by: Claude <noreply@anthropic.com>
0 commit comments