fix(riscv64): relax out-of-range branches + frame-slot overlap (#1666) by octalide · Pull Request #1671 · briar-systems/mach

octalide · 2026-06-27T04:05:29Z

Closes #1666
Closes #1670
Closes #1672

Fixes the riscv64 backend so arbitrary, large, real std code byte-encodes and runs correctly under qemu. Three distinct backend gaps were found by driving a real std workload (dns_nameserver, crypto sha256/sha512, bignum, large match chains, >2KiB frames) to a clean cross-compile + qemu run. Each is its own commit, cross-checked against the unrelaxed x86_64 build. Merged origin/dev (RV64A inline-asm #1669) in; the fixture exercises those atomics too.

Gap 1 - no long-branch relaxation (#1666)

Symptom: error: encode: riscv64 branch displacement exceeds the +-4KiB range on any function whose encoded body exceeds the B-type +-4 KiB reach (trigger: std.system.os.linux.shared.dns_nameserver, a var buf:[2048]u8 frame whose >2 KiB stack offsets expand into multi-instruction sequences, inflating the body past 4 KiB).
Root cause: RISC-V B-type conditional branches reach only +-4 KiB and J-type jumps only +-1 MiB. The pass-2 patcher rejected an out-of-range displacement instead of relaxing the branch.
Fix: branch relaxation in the riscv64 encoder. A conditional past +-4 KiB becomes its inverted guard (a short branch skipping the trampoline) + a jal to the real target; a jal past +-1 MiB becomes an auipc t0,%hi ; jalr x0,t0,%lo pc-relative trampoline (full 32-bit reach). It runs as a per-function fixpoint after the body is emitted: each growth opens a text gap via the new shared encode.insert_text, which slides the rest of the function down and fixes the block / fixup / symbol / relocation tables, so each rescan remeasures against the grown layout. encode.block_offset is made public so the pass can resolve targets; patch_branch_riscv64 learns the auipc+jalr form. The inverted guard's skip is purely local and resolved in the pass; the trampoline's real-target displacement is left to pass 2 through the repointed fixup.

Gap 2 - frame slots overlap the saved ra/s0 record (#1670)

Symptom: any riscv64 function with an address-taken local or a spill / aggregate slot (frame.size > 0) corrupts its own saved return address and segfaults at runtime (SIGSEGV, NULL deref). It byte-encodes fine, so byte-verify never caught it, and the register-only freestanding fixtures never exercised it.
Root cause: the prologue pins s0 at the frame top, with the 16-byte ra/s0 record and the callee-saved areas occupying the bytes immediately below it. The shared frame phase assigns local slot offsets just below the frame pointer (the x86 model: locals immediately below fp), so on riscv64 those land on top of the saved record - e.g. var buf:[256]u8 got s0-256 while ra was saved at s0-8, so zeroing the buffer clobbered ra.
Fix: bias the s0-relative slot offset down by frame_reserved_top (16 + callee-saved GP + FP bytes) so locals land below the reserved region. Localized to frame_slot_offset, mirrored in the assembly printer. s0 stays at the frame top, so incoming stack-argument access is unchanged.

Gap 3 - 32-bit and/or/xor encode illegal word-group ops (#1672)

Symptom: a 32-bit bitwise and / or / xor encodes an illegal instruction and faults with SIGILL at runtime (surfaced building std crypto / bignum, which are u32-heavy). It byte-encodes without error.
Root cause: encode_alu3 selected the major opcode through alu_opcode(width), which returns OP_32 (the RV64 word group) at 32-bit width - but RISC-V defines no andw / orw / xorw (the word funct3 6/7/4 with funct7 0 is reserved). add / sub / mul / shifts have valid word forms; only the logical ops do not.
Fix: thread a word_form flag through encode_alu3 so mul keeps its mulw word form while and / or / xor always encode as the full-register OP. A bitwise op of two consistently extended 32-bit operands yields a consistently extended result, so the full-register form is correct at every width.

Verification

Regression fixture (test/riscv64): extends the freestanding rv64 fixture with a >4 KiB parse_probe (forces long-branch relaxation - verify.sh asserts the inverted-guard + jal sequence), a stack_probe (stack-local frame slot), and a bitmix (32-bit bitwise word-group). Folded with the existing const-shift and RV64A atomics probes into a qemu exit code (70) asserted by verify.sh; the code matches the unrelaxed x86_64 build, so a regression in any fix changes it. verify.sh now disassembles with --mattr=+m,+f,+d,+a (the backend emits no .riscv.attributes ISA string yet, so objdump must be told the extensions) and reads grep inputs from here-strings (a large disassembly tripped a SIGPIPE under pipefail).
Real std consumers cross-compiled for linux-riscv64 and run under qemu, each matching the native x86_64 result: sha256 / sha512 / bignum / wide mul-div-rem; math.bits (clz/ctz/popcount/rotate/byteswap/bitreverse) / math (min/max/abs/log2/sat_*) / fnv1a / RV64D float (add/mul/div/cvt/cmp) / 32-bit div-rem word forms; and a heavy combined sha256-over-4KiB x8 + bignum workload.
Self-host fixpoint holds (mach build . -o a && a build . -o b && b build . -o c && cmp b c -> b == c byte-identical).
mach test . passes (666 passed, 0 failed). x86_64 and aarch64 unaffected (the fixpoint + existing cross lanes).

Minor follow-up observation (not in scope here, does not affect qemu execution): the riscv64 object writer emits no .riscv.attributes ISA-string section, so external tools default to base RV64I and render valid M/F/D/A words as <unknown> unless --mattr is passed.

🤖 Generated with Claude Code

RISC-V B-type conditional branches reach only +-4 KiB and J-type jumps only +-1 MiB, so a function whose encoded body exceeds those ranges could not encode: the patcher rejected the overflowing displacement instead of relaxing it. A large stack frame (>2 KiB offsets expanding into multi-instruction sequences) inflates a body past 4 KiB and overflows a conditional branch spanning it. Add branch relaxation to the riscv64 encoder. A conditional past +-4 KiB becomes its inverted guard (a short branch skipping the trampoline) plus a jal to the real target; a jal past +-1 MiB becomes an auipc+jalr pc-relative trampoline. The relaxation runs as a per-function fixpoint after the body is emitted: each growth opens a text gap that slides the rest of the function down and fixes the block / fixup / symbol / relocation tables, so each rescan remeasures against the grown layout. The new shared encode.insert_text owns the ISA-general gap mechanism; encode.block_offset is made public so the pass can resolve targets. Closes #1666

Any riscv64 function with an address-taken local or a spill / aggregate slot (frame.size > 0) corrupted its own saved return address and segfaulted at runtime. The prologue pins s0 at the frame top, with the 16-byte ra / s0 record and the callee-saved areas occupying the bytes immediately below it, but the shared frame phase assigns local slot offsets just below the frame pointer (the x86 model). On riscv64 those landed on top of the saved record - e.g. a [256]u8 buffer at s0-256 while ra was saved at s0-8, so zeroing the buffer clobbered ra. Bias the s0-relative slot offset down by frame_reserved_top (16 + callee-saved GP + FP bytes) so locals land below the reserved region. Localized to frame_slot_offset, mirrored in the assembly printer; s0 stays at the frame top so incoming stack-argument access is unchanged. It byte-encoded fine, so byte-verify never caught it and the register-only freestanding fixtures never exercised it.

Extend the freestanding rv64 fixture with two probes folded into the qemu exit code, so a regression in either fix changes the asserted code: - parse_probe: a deterministic parse over a [2048]u8 stack buffer. The large frame inflates the encoded body past 4KiB and leaves a conditional branch out of the B-type +-4KiB range, exercising long-branch relaxation (#1666). verify.sh asserts the inverted-guard + jal sequence is present. - stack_probe: a small stack-local buffer written then read back, exercising frame-slot addressing (#1670). The result matches the unrelaxed x86_64 / aarch64 build (exit code 68). verify.sh: disassemble with --mattr=+m,+f,+d so the <unknown> guard does not false-positive on valid M / F / D words (the backend emits no .riscv.attributes ISA string yet), and read grep inputs from here-strings so a large disassembly does not trip a SIGPIPE under pipefail.

…h-relax # Conflicts: # test/riscv64/src/main.mach # test/riscv64/verify.sh

A 32-bit bitwise and / or / xor encoded an illegal instruction and faulted with SIGILL at runtime (surfaced building std crypto / bignum, which is u32-heavy). encode_alu3 selected the major opcode through alu_opcode(width), which returns OP_32 (the RV64 word group) at 32-bit width - but RISC-V defines no andw / orw / xorw, so the word matched no encoding (funct3 6/7/4 with funct7 0 in OP_32 is reserved). Bitwise ops have no word form: they are bit-for-bit identical at any width, and a bitwise op of two consistently extended 32-bit operands yields a consistently extended result. Thread a word_form flag through encode_alu3 so mul keeps its mulw word form while and / or / xor always encode as the full-register OP. add / sub / shifts / mul / div already use valid word forms, so only the logical ops were affected. Verified: a std sha256 / sha512 / bignum consumer cross-compiled for linux-riscv64 now runs under qemu to the same result as the native build, and a 32-bit bitwise probe in the fixture matches the native value.

octalide added 5 commits June 27, 2026 00:04

Merge remote-tracking branch 'origin/dev' into fix/1666-riscv64-branc…

61d081f

…h-relax # Conflicts: # test/riscv64/src/main.mach # test/riscv64/verify.sh

octalide marked this pull request as ready for review June 27, 2026 04:33

octalide merged commit 5afdbdc into dev Jun 27, 2026
10 checks passed

octalide deleted the fix/1666-riscv64-branch-relax branch June 27, 2026 04:34

octalide mentioned this pull request Jun 27, 2026

link(riscv64): object writer omits .riscv.attributes ISA-string section #1673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(riscv64): relax out-of-range branches + frame-slot overlap (#1666)#1671

fix(riscv64): relax out-of-range branches + frame-slot overlap (#1666)#1671
octalide merged 5 commits into
devfrom
fix/1666-riscv64-branch-relax

octalide commented Jun 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

octalide commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Gap 1 - no long-branch relaxation (#1666)

Gap 2 - frame slots overlap the saved ra/s0 record (#1670)

Gap 3 - 32-bit and/or/xor encode illegal word-group ops (#1672)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octalide commented Jun 27, 2026 •

edited

Loading