|
| 1 | +// SPDX-License-Identifier: MPL-2.0 |
| 2 | +// SPDX-FileCopyrightText: 2025-2026 hyperpolymath |
| 3 | += Phase F — string-wall slice 2: string_from_char_code (evidence) |
| 4 | +:toc: macro |
| 5 | + |
| 6 | +[IMPORTANT] |
| 7 | +==== |
| 8 | +*Slice landed: the write-side of the string ABI.* `string_from_char_code(n)` |
| 9 | +— already wired in resolve/typecheck/interp — gained a wasm-backend lowering |
| 10 | +that *bump-allocates* a one-byte string `[len: i32 LE = 1][byte]` on the heap, |
| 11 | +where the byte is the low 8 bits of `n`. This is the first heap-allocating |
| 12 | +string op, establishing the write side of the `[len][utf8]` ABI whose read |
| 13 | +side landed in slice 1. |
| 14 | +
|
| 15 | +Second slice of the *variable-string backend* wall |
| 16 | +(`proposals/MIGRATION-PLAN.adoc` §"The two walls", Phase F). Remaining |
| 17 | +write-side ops (`string_sub`/`slice`/concat/case-fold) need *runtime-length* |
| 18 | +allocation — deferred to slice 3, which will extract the shared |
| 19 | +runtime-length helper once two real callers fix its signature. |
| 20 | +==== |
| 21 | + |
| 22 | +toc::[] |
| 23 | + |
| 24 | +== What was missing |
| 25 | + |
| 26 | +`string_from_char_code` was wired end-to-end except the wasm backend: |
| 27 | + |
| 28 | +[cols="2,1,1,1,1",options="header"] |
| 29 | +|=== |
| 30 | +| Builtin | resolve.ml | typecheck.ml | interp.ml | codegen.ml (wasm) |
| 31 | +| `string_char_code_at` | ✓ | ✓ | ✓ | ✓ (slice 1) |
| 32 | +| `char_to_int` | ✓ | ✓ | ✓ | ✓ (slice 1) |
| 33 | +| `string_from_char_code` | ✓ | ✓ | ✓ | *was missing -> added* |
| 34 | +|=== |
| 35 | + |
| 36 | +Before this slice it failed at codegen: |
| 37 | + |
| 38 | +---- |
| 39 | +Code generation error: (Codegen.UnboundVariable |
| 40 | + "Function or variable not found: string_from_char_code") |
| 41 | +---- |
| 42 | + |
| 43 | +== The lowering (lib/codegen.ml) |
| 44 | + |
| 45 | +`string_from_char_code(n)` — interp oracle (lib/interp.ml): |
| 46 | +`String.make 1 (Char.chr (n land 0xff))`. |
| 47 | + |
| 48 | +---- |
| 49 | +n_code; LocalSet val ;; val = n |
| 50 | +gen_heap_alloc 5; LocalSet ptr ;; ptr = bump-allocated base (5 bytes) |
| 51 | +LocalGet ptr; I32Const 1; I32Store (offset=0) ;; [ptr+0] = length 1 |
| 52 | +LocalGet ptr; LocalGet val; I32Store8 (offset=4) ;; [ptr+4] = low byte of n |
| 53 | +LocalGet ptr ;; result = the string pointer |
| 54 | +---- |
| 55 | + |
| 56 | +`I32Store8` writes only the low 8 bits, so it performs the `land 0xff` |
| 57 | +masking itself — including the correct result for negative `n` (`-1` stores |
| 58 | +`0xFF`, read back as `255` via slice 1's `I32Load8U`). No explicit mask |
| 59 | +instruction is needed. |
| 60 | + |
| 61 | +The allocation reuses the existing bump allocator (`gen_heap_alloc`, the same |
| 62 | +one closures and enum variants use), so no new memory machinery is |
| 63 | +introduced. |
| 64 | + |
| 65 | +== Gate evidence |
| 66 | + |
| 67 | +=== Gate 1 — builds |
| 68 | + |
| 69 | +`dune build bin/main.exe` exit 0. The previously-failing probe now compiles |
| 70 | +and round-trips with slice 1's reader: |
| 71 | + |
| 72 | +---- |
| 73 | +$ affinescript compile sfcc.affine -o sfcc.wasm # string_char_code_at(string_from_char_code(66), 0) |
| 74 | +Compiled sfcc.affine -> sfcc.wasm (WASM) |
| 75 | +$ node ... main() => 66 |
| 76 | +---- |
| 77 | + |
| 78 | +=== Gate 2 — parity (wasm vs interpreter oracle) |
| 79 | + |
| 80 | +Same inputs through both backends agree across masking, NUL, negatives, |
| 81 | +overflow, length, and out-of-bounds-after-construction. Interp oracle |
| 82 | +confirmed against the real library API; wasm executed under Node. |
| 83 | + |
| 84 | +[cols="4,1,1",options="header"] |
| 85 | +|=== |
| 86 | +| Expression | interp | wasm |
| 87 | +| `scca(string_from_char_code(66), 0)` | 66 | 66 |
| 88 | +| `scca(string_from_char_code(0), 0)` (NUL) | 0 | 0 |
| 89 | +| `scca(string_from_char_code(255), 0)` | 255 | 255 |
| 90 | +| `scca(string_from_char_code(256), 0)` (mask) | 0 | 0 |
| 91 | +| `scca(string_from_char_code(-1), 0)` (low byte) | 255 | 255 |
| 92 | +| `scca(string_from_char_code(320), 0)` (mask) | 64 | 64 |
| 93 | +| `string_length(string_from_char_code(65))` | 1 | 1 |
| 94 | +| `scca(string_from_char_code(65), 1)` (OOB) | -1 | -1 |
| 95 | +|=== |
| 96 | + |
| 97 | +(`scca` = `string_char_code_at`, the slice-1 reader.) |
| 98 | + |
| 99 | +The packed fixture `tests/codegen/string_from_char_code.affine` returns |
| 100 | +`4283202` (positional pack of three round-tripped bytes + the boundary |
| 101 | +trio); both backends produce it. |
| 102 | + |
| 103 | +== Tests added |
| 104 | + |
| 105 | +* `test/test_e2e.ml` — group *"E2E String-wall slice 2 (string_from_char_code)"*: |
| 106 | + seven interp-oracle cases (round-trip, NUL byte, high byte, mask-overflow, |
| 107 | + mask-negative, length, OOB-after-construction). Runs under `dune runtest`. |
| 108 | + The interp consumer coverage mandated by `.claude/CLAUDE.md` |
| 109 | + §"Test-fixture hygiene". |
| 110 | +* `tests/codegen/string_from_char_code.affine` + |
| 111 | + `tests/codegen/test_string_from_char_code.mjs` — executable wasm parity, |
| 112 | + run by `tools/run_codegen_wasm_tests.sh` (CI). |
| 113 | + |
| 114 | +Full `tools/run_codegen_wasm_tests.sh` run: *all* codegen WASM tests pass |
| 115 | +(both slice-1 and slice-2 string harnesses), no sibling regressions. |
| 116 | + |
| 117 | +== Corpus impact |
| 118 | + |
| 119 | +This slice adds string *construction from a code point* — the building block |
| 120 | +for any kernel that emits text one byte at a time (manual int->string |
| 121 | +ladders, single-character separators, byte-builders). Combined with slice 1's |
| 122 | +indexing, the read+write byte primitives are now both present. |
| 123 | + |
| 124 | +It does *not* yet unblock ops that copy *ranges* of bytes (`string_sub`, |
| 125 | +`slice`) or *concatenate* (`++`, `string_concat`) — those need a |
| 126 | +runtime-length allocation, the subject of slice 3. |
| 127 | + |
| 128 | +== Next slices (variable-string backend wall) |
| 129 | + |
| 130 | +. *Runtime-length allocation ABI* — a bump-alloc variant taking a runtime |
| 131 | + byte count, plus a copy-loop primitive. First consumers `string_sub(s, |
| 132 | + start, len)` and `slice(s, lo, hi)` (bounded byte copies), which between |
| 133 | + them fix the shared helper's signature. |
| 134 | +. *`startsWith` / `string_find`* — prefix/substring scans (read-side; build on |
| 135 | + slice-1 indexing, no allocation). |
| 136 | +. *Concat + case-folding* — `++` on strings, `to_lowercase` / `to_uppercase`. |
| 137 | + |
| 138 | +Each slice: add the codegen arm, an interp-parity e2e group, and a |
| 139 | +`tests/codegen/*.mjs` executable check, then re-run the census and drop the |
| 140 | +gated count. |
0 commit comments