Skip to content

codegen: store singular message fields inline by default (PointerRepr::Inline) (#248)#250

Merged
iainmcgin merged 3 commits into
mainfrom
iain/inline-messagefield
Jun 27, 2026
Merged

codegen: store singular message fields inline by default (PointerRepr::Inline) (#248)#250
iainmcgin merged 3 commits into
mainfrom
iain/inline-messagefield

Conversation

@iainmcgin

@iainmcgin iainmcgin commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Closes #248.

Summary

Makes inline storage the default for singular message fields: codegen now emits MessageField<T, ::buffa::Inline<T>> (laid out as Option<T>, no per-field heap allocation) for every non-recursive field. Recursive fields are detected at codegen time and stay on Box automatically, so the default is always sized.

  • buffa::Inline<T> — a #[repr(transparent)] ProtoBox<T> newtype that stores T directly.
  • PointerRepr::Inline — now #[default]. The recursion DFS (inline_is_recursive) follows both inline-edge kinds — rule-matched unbox_oneof variants and Inline-resolved singular fields — so a cycle through one of each is caught for both knobs. ctx.pointer_repr() demotes any Inline not in the precomputed safe set to Box; oneof-variant paths are never in that set, so a blanket Inline rule can't bypass unbox_oneof's guard.
  • Opt-out is the existing box_type_in(PointerRepr::Box, &[".pkg.Msg.field"]) (per field/prefix) or box_type(PointerRepr::Box) (restore the old global default). box_type_in now normalizes a missing leading dot — previously a dotless path silently matched nothing.

Benchmark validation

TriMesh is a new benchmark message (256 Face elements per payload, four singular Vertex submessages each — the workload shape where boxed storage costs four heap allocations per element). On bare-metal (c7i.metal-24xl, 10s measurement), inline vs box:

op mesh (target) analytics_event (control)
decode −36.3% (2.08 → 1.32 ms) +4.5% (p=0.44, noise)
merge −55.9% −3.8% (noise floor)
decode_view −15.4% −1.9%
json_decode −11.0% −0.4%
encode / compute_size / json_encode flat flat

AnalyticsEvent has zero singular message fields (everything message-typed is repeated), so its flat result confirms the change doesn't perturb messages it can't help.

Fallout

  • json_helpers::message_field_always_present and pool::clone_options genericized over P: ProtoBox<T> — the only two runtime helpers that hardcoded the Box pointer.
  • Bootstrap descriptor types regenerated (*Options fields now inline). WKTs are byte-identical — none has a singular message field.
  • DESIGN.md, docs/guide.md, and the MessageField/ProtoBox docs updated for the inline default.

Breaking change

Generated singular message field types change from MessageField<T> to MessageField<T, Inline<T>>. Explicit MessageField<Foo> annotations now mean the boxed form and will mismatch — drop the annotation and let P infer from the field's declared type. See the changelog entry for the full migration note.

Adds recursion-aware inline storage for singular message fields, mirroring
unbox_oneof():

- buffa::Inline<T>: a repr(transparent) ProtoBox<T> newtype, so
  MessageField<T, Inline<T>> is laid out as Option<T> with no heap
  allocation.
- PointerRepr::Inline: selects it. Recursion-aware — at context build,
  resolve_inlined_fields precomputes the set of singular fields where the
  raw repr is Inline and the inline-edge DFS finds no cycle;
  ctx.pointer_repr() demotes any Inline not in that set to Box. So
  box_type(PointerRepr::Inline) is also recursion-safe, and oneof variant
  paths (never in the set) auto-demote to Box so they can't bypass
  unbox_oneof's guard.
- Config::unbox_message_fields[_in](): sugar for
  box_type_in(PointerRepr::Inline, paths) with leading-dot normalization.

The recursion DFS (renamed inline_is_recursive) now follows both edge
kinds — rule-matched oneof variants and raw-Inline singular fields — so a
cycle through one of each is caught for both knobs.

Default codegen output is byte-identical to main (PointerRepr defaults to
Box; WKT regen confirms no diff).
@github-actions

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

TriMesh is a mesh-shaped workload (256 Face elements per payload, each with
four singular Vertex submessages) — the case where boxed MessageField storage
costs four heap allocations per element on decode. None of the existing
benchmark messages has more than one singular message field per payload, so
this is the regression target for PointerRepr::Inline (#248).

- bench_messages.proto + iso/mesh.proto: TriMesh / Face / Vertex
- gen-datasets: gen_mesh (appended last so existing dataset RNG state is
  unperturbed); 50x256-face dataset (~896KB)
- bench-buffa: mesh feature, isolated bench target, protobuf.rs entries,
  inline_fields feature that calls .unbox_message_fields() in build.rs for
  A/B builds

Iso-guard cfg lists in the existing per-message bench targets gain
feature = "mesh" so isolation still holds.
…lds()

Non-recursive singular message fields are now stored inline by default
(MessageField<T, ::buffa::Inline<T>>, laid out as Option<T>). The recursion
guard runs unconditionally so the default is always sized; recursive fields
stay on Box automatically.

The opt-out is the existing box_type_in(PointerRepr::Box, paths) (or
box_type(PointerRepr::Box) for the old global default), so the
unbox_message_fields[_in]() builders added in commit 1 are removed as
redundant. box_type_in now normalizes a missing leading dot — previously a
dotless path silently matched nothing.

Fallout absorbed:
- json_helpers::message_field_always_present and pool::clone_options
  genericized over P: ProtoBox<T> (the only two runtime helpers that
  hardcoded the Box pointer)
- Bootstrap descriptor types regenerated (FileOptions/MessageOptions/etc.
  now inline). WKTs are byte-identical — none has a singular message field.
- DESIGN.md / docs/guide.md / MessageField docs updated to describe the
  inline default
- resolve_unboxed_variants and resolve_inlined_fields now take the prebuilt
  message index so it's built once per codegen, not twice

The unconditional resolve_inlined_fields is O(F*(V+E)) — a fresh DFS per
candidate field. Noted in the doc; memoize per-target reachable sets if a
very large schema makes it noticeable.
@iainmcgin iainmcgin changed the title codegen: add PointerRepr::Inline + unbox_message_fields() (#248) codegen: store singular message fields inline by default (PointerRepr::Inline) (#248) Jun 27, 2026
@iainmcgin iainmcgin marked this pull request as ready for review June 27, 2026 01:46
@iainmcgin iainmcgin requested a review from rpb-ant June 27, 2026 01:46
@iainmcgin iainmcgin added this pull request to the merge queue Jun 27, 2026
Merged via the queue into main with commit a2384e1 Jun 27, 2026
9 checks passed
@iainmcgin iainmcgin deleted the iain/inline-messagefield branch June 27, 2026 18:16
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

codegen: recursion-aware inline storage for singular message fields (PointerRepr::Inline + unbox_message_fields)

2 participants