Skip to content

release: v2.9.0#1675

Merged
octalide merged 23 commits into
mainfrom
dev
Jun 27, 2026
Merged

release: v2.9.0#1675
octalide merged 23 commits into
mainfrom
dev

Conversation

@octalide

Copy link
Copy Markdown
Collaborator

Integration merge of dev into main for the v2.9.0 release.

Since v2.8.0:

Tag v2.9.0 pushed on the merge commit. Makes riscv64 compile + run real std, unblocking mach-std #308/#307.

octalide added 23 commits June 26, 2026 22:54
…ed pointers

Implement the information-flow type rules for the `^` secret qualifier (#1645),
building on the frontend tokens/AST from #1644.

- type universe: TYPE_SECRET wraps an inner type as the top of a two-point
  secrecy lattice; intern_secret dedups and collapses `^^T`. Classifier
  predicates (is_integer/is_numeric/is_float/is_pointer_like) and byte_size see
  through `^`; carries_secret and secrecy_structure_equal back the welded rules.
- resolve: bind `^T` inner names.
- infer: resolve `^T` to TYPE_SECRET; JOIN secrecy through binary/unary results;
  taint field/element reads from secret containers; `:^` strip (the sole
  downgrade); secret coerces up the lattice with no syntax.
- gates: secret branch/loop condition, secret memory index, secret `/`/`%`
  operand, and secret `&&`/`||` operand are compile errors with teaching notes.
- welded storage: `*^T` cannot erase to `ptr`, `::`/`:~` cannot add or drop `^`,
  and a `uni`'s overlapping variants must agree on secrecy.
- lowering: secrecy has no runtime representation, so `^T` lowers to `T`'s shape
  and `:^` is a value passthrough; mangling and generic substitution thread it.

The `#[oblivious]` contract and the secret-taint-to-MIR are out of scope here
and land in #1646.
- driver: end-to-end sema tests for the secret qualifier (#1645) - the clean
  join/coerce/strip program, the four leakage gates, `::`/`:~` secrecy
  preservation, welded-pointer non-launderability, and uni secrecy agreement,
  each with a passing public counterpart.
- doc/language: new secrecy.md reference (lattice, join, gates, `:^`, welded
  pointers, trusted base); grammar/operators/types/uni cross-link it and drop
  the "not yet enforced" notes now that sema enforces the rules.
The darwin os vtable advertised "_main" as the program entry, but the std
darwin runtime defines the entry as "start" (the conventional Mach-O thread
entry that reads the raw kernel stack). Linking a real darwin executable
therefore failed to resolve the entry. Point the vtable at "start" so the
linker keys the entry symbol the runtime provides.

Refs #1178
`&&` / `||` short-circuit by branching on the LEFT operand (the right is
evaluated conditionally), so only a secret left operand is the secret-dependent
branch the leakage model observes. A secret right operand merely taints the
result via the join. Narrowing the gate from either-operand to left-only stops
rejecting the sound `public && secret` while still catching the real branch
leak. Test matrix and secrecy.md updated to match.
Implements emit_dyn_exec for the Mach-O format, framing a dyld-loadable
MH_EXECUTE: a leading __PAGEZERO, a __TEXT that maps the mach header below the
linker's text segment so dyld can read the load commands, the linker's segments,
synthesized __STUBS/__GOT segments for function imports, and a __LINKEDIT
carrying an LC_DYLD_INFO_ONLY bind stream plus the symbol/string tables. Emits
LC_LOAD_DYLINKER (/usr/lib/dyld), one LC_LOAD_DYLIB per dependency,
LC_SYMTAB/LC_DYSYMTAB, and the static LC_UNIXTHREAD entry. Each function import's
got slot is bound non-lazily and every deferred call site is redirected to a
per-arch import stub (x86_64 rip-relative jump, arm64 adrp/ldr/br). Both
architectures are supported; the writer is registered on the vtable.

Also round-trips a symbol's function-ness through the relocatable object: an
undefined symbol carries no n_type class, so emit_object records it as
REFERENCE_FLAG_UNDEFINED_LAZY in n_desc and parse_object recovers it, which the
linker reads to give a dynamic function import its call stub.

Unit tests cover the dynamic-exec skeleton for both architectures.

Refs #1178
A `.dylib` link input against a Mach-O target is now recorded as an
LC_LOAD_DYLIB dependency by its install name, with no on-disk probe (dyld
resolves it at run time, and a system dylib is absent on a cross host). This
mirrors the existing PE `.dll` handling and lets a darwin program declare a
dynamic dependency, driving the linker down the dynamic-exec path. A `.dylib`
against a non-Mach-O target is rejected.

Refs #1178
Updates the static fixture and verify lane for the "start" entry symbol, and adds
a dynamic fixture (a libprobe.dylib dependency and a probe_add import) that drives
emit_dyn_exec. The verify lane now cross-compiles both fixtures for both Darwin
triples and byte-verifies the dynamic load commands (LC_LOAD_DYLINKER,
LC_LOAD_DYLIB, LC_DYLD_INFO_ONLY bind stream, LC_SYMTAB/LC_DYSYMTAB) and the
import stub with the llvm tools. No execution step: Darwin binaries cannot run on
the Linux host, and an arm64 Darwin binary additionally needs a code signature
this writer does not emit.

Refs #1178
feat(sema): secret-qualifier flow typing - join, gate, welded-storage pointers
Extend the riscv64 inline-asm assembler with the A-extension subset that
std's riscv64 atomic arms need: lr.w/lr.d, sc.w/sc.d, the amo* read-
modify-writes (swap/add/and/or/xor/max/min/maxu/minu) in .w and .d, the
.aq/.rl/.aqrl ordering suffixes, and pause for spin_hint.

The AMO major opcode (0x2F) is R-type with funct5 in bits [31:27] and the
aq/rl bits in [26:25], so the assembled funct7 is (funct5 << 2) | (aq <<
1) | rl. Atomics take the base-only (rs1) address form with no
displacement, unlike regular loads and stores. lr is the two-operand
rd, (rs1) form; sc and the amo* forms are rd, rs2, (rs1).

The suffixes are parsed inline so a future rv32 reuses the .w forms. A
byte-test covers every mnemonic and ordering variant against llvm-mc
-mattr=+a, plus the no-displacement and unsupported-prefix rejections.

Part of #1668.
# Conflicts:
#	src/lang/target/of/macho.mach
…nalyzer

The inline-asm clobber analyzer drove the new atomic mnemonics to the
conservative all-register fallback. Classify them precisely: lr / sc /
amo* each write their rd operand (the leftmost register), and pause
writes nothing like fence. A prefix test (lr. / sc. / amo) covers the
whole family across its width and ordering suffixes, so the allocator no
longer over-preserves around an atomic block. Extends the clobber test.

Part of #1668.
feat(target): macOS / Darwin OS support (runtime entry, syscalls, dynamic exec)
RISC-V B-type conditional branches reach only +-4 KiB and J-type jumps
only +-1 MiB, so a function whose encoded body exceeds those ranges could
not encode: the patcher rejected the overflowing displacement instead of
relaxing it. A large stack frame (>2 KiB offsets expanding into
multi-instruction sequences) inflates a body past 4 KiB and overflows a
conditional branch spanning it.

Add branch relaxation to the riscv64 encoder. A conditional past +-4 KiB
becomes its inverted guard (a short branch skipping the trampoline) plus a
jal to the real target; a jal past +-1 MiB becomes an auipc+jalr
pc-relative trampoline. The relaxation runs as a per-function fixpoint
after the body is emitted: each growth opens a text gap that slides the
rest of the function down and fixes the block / fixup / symbol /
relocation tables, so each rescan remeasures against the grown layout. The
new shared encode.insert_text owns the ISA-general gap mechanism;
encode.block_offset is made public so the pass can resolve targets.

Closes #1666
Any riscv64 function with an address-taken local or a spill / aggregate
slot (frame.size > 0) corrupted its own saved return address and
segfaulted at runtime. The prologue pins s0 at the frame top, with the
16-byte ra / s0 record and the callee-saved areas occupying the bytes
immediately below it, but the shared frame phase assigns local slot
offsets just below the frame pointer (the x86 model). On riscv64 those
landed on top of the saved record - e.g. a [256]u8 buffer at s0-256 while
ra was saved at s0-8, so zeroing the buffer clobbered ra.

Bias the s0-relative slot offset down by frame_reserved_top (16 +
callee-saved GP + FP bytes) so locals land below the reserved region.
Localized to frame_slot_offset, mirrored in the assembly printer; s0 stays
at the frame top so incoming stack-argument access is unchanged. It
byte-encoded fine, so byte-verify never caught it and the register-only
freestanding fixtures never exercised it.
Extend the freestanding rv64 fixture with the RV64A atomics: an lr.d/sc.d
compare-and-swap loop swaps a stack cell 0 -> 7 (the reservation-retry
and mismatch arms are encoded though the fast path takes neither), then
amoadd.d adds 4 and returns the swapped value. the folded result (prior 7
+ post-rmw 11 = 18) is added to the exit code, so a wrong atomic encoding
or a lost store changes it. the atomics run in _start, matching how the
fixture already emits its inline-asm exit.

verify.sh asserts the new exit code 71 and that lr.d / sc.d / amoadd.d
disassemble as the real instructions (--mattr=+a enables the A-extension
decode so the words are not <unknown>).

Part of #1668.
Extend the freestanding rv64 fixture with two probes folded into the qemu
exit code, so a regression in either fix changes the asserted code:

- parse_probe: a deterministic parse over a [2048]u8 stack buffer. The
  large frame inflates the encoded body past 4KiB and leaves a conditional
  branch out of the B-type +-4KiB range, exercising long-branch relaxation
  (#1666). verify.sh asserts the inverted-guard + jal sequence is present.
- stack_probe: a small stack-local buffer written then read back,
  exercising frame-slot addressing (#1670).

The result matches the unrelaxed x86_64 / aarch64 build (exit code 68).

verify.sh: disassemble with --mattr=+m,+f,+d so the <unknown> guard does
not false-positive on valid M / F / D words (the backend emits no
.riscv.attributes ISA string yet), and read grep inputs from here-strings
so a large disassembly does not trip a SIGPIPE under pipefail.
feat(riscv64): RV64A atomic inline-asm mnemonics (lr/sc, amo*, ordering suffixes)
…h-relax

# Conflicts:
#	test/riscv64/src/main.mach
#	test/riscv64/verify.sh
A 32-bit bitwise and / or / xor encoded an illegal instruction and faulted
with SIGILL at runtime (surfaced building std crypto / bignum, which is
u32-heavy). encode_alu3 selected the major opcode through alu_opcode(width),
which returns OP_32 (the RV64 word group) at 32-bit width - but RISC-V
defines no andw / orw / xorw, so the word matched no encoding (funct3 6/7/4
with funct7 0 in OP_32 is reserved).

Bitwise ops have no word form: they are bit-for-bit identical at any width,
and a bitwise op of two consistently extended 32-bit operands yields a
consistently extended result. Thread a word_form flag through encode_alu3 so
mul keeps its mulw word form while and / or / xor always encode as the
full-register OP. add / sub / shifts / mul / div already use valid word
forms, so only the logical ops were affected.

Verified: a std sha256 / sha512 / bignum consumer cross-compiled for
linux-riscv64 now runs under qemu to the same result as the native build,
and a 32-bit bitwise probe in the fixture matches the native value.
fix(riscv64): relax out-of-range branches + frame-slot overlap (#1666)
@octalide octalide merged commit f76d4ee into main Jun 27, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant