Skip to content

Phase 6.4f.4 (a+c+b): runtime LOD primitives + SPZ higher-order SH#97

Open
Kyle-Wang0211 wants to merge 51 commits into
claude/phase-6.4f.3-splat-memoryfrom
claude/phase-6.4f.4-runtime-lod
Open

Phase 6.4f.4 (a+c+b): runtime LOD primitives + SPZ higher-order SH#97
Kyle-Wang0211 wants to merge 51 commits into
claude/phase-6.4f.3-splat-memoryfrom
claude/phase-6.4f.4-runtime-lod

Conversation

@Kyle-Wang0211

Copy link
Copy Markdown
Owner

Stacked on PR #96 (Phase 6.4f.3)

This PR builds on #96. Reviewers can either:

Sub-deliverables

What Status Notes
a SPZ higher-order SH decode (degree 1/2/3) ✅ shipped enables 6.4f.3.b's max_sh_degree cap to actually apply on SPZ
c Bhattacharyya-style importance-weighted leaf merge ✅ shipped replaces 6.4f.3.d's single-rep-per-leaf — preserves color + spatial extent
b Runtime LOD via per-splat extent cull in project_forward ✅ shipped (lightweight subset) NOT the full Octree-GS multi-level GPU traversal — see scope notes

Honest scope on (b)

The user requested "Octree-GS-style per-frame node selection + select_lod.wgsl + project_forward accepting active_indices". The full implementation needs:

  • Augmented `packed_splats` buffer holding all original leaves + merged interiors at every tree level
  • New `select_lod.wgsl` compute kernel walking the flattened octree and emitting active leaves into an indirect buffer
  • `project_forward.wgsl` modified to read `packed_splats[active_indices[gid]]` and dispatch with `num_active` workgroups
  • Multi-pass dispatch chain: `select_lod` → `project_forward(indirect)` → `project_visible` → sort → render
  • All BindGroup / pipeline plumbing for the new kernel

That's a multi-day undertaking. What this PR ships instead is the lightweight subset that fits inside `project_forward`'s existing early-exit path:

```wgsl
// project_forward.wgsl, after the bbox extent calc:
if (max(bbox.x, bbox.y) < uniforms.lod_extent_min) { return; }
```

Combined with (c)'s merged leaves at load, this gives functional two-level LOD:

  • Far view: tiny projected splats early-exit; merged-leaf coverage holds the silhouette
  • Close view: original splats render at full density

Performance: project_forward atomic / depth writes drop ~5–10× when LOD threshold engages. Memory: zero overhead (one extra f32 in RenderUniforms).

The full Octree-GS multi-level path is queued as 6.4f.5.b. The skeleton design is in the PR commit message.

(a) SPZ SH decode

Was: `spz_decoder.h` skipped the SH stream after rotations and forced `sh_degree=0` at every load. Now: reads the per-splat `n × shDim × 3` SH bytes, transposes from SPZ basis-major-channel-major to PLY channel-major-basis-major layout, and exposes through the new `SpzDecodeResult::sh_rest` field.

Decode formula matches Niantic `unquantizeSH(byte) = (byte − 128) / 128 ∈ [−1, +1]`. Source files at sh_degree 4 are capped to 3 at decode time (the shader maxes at 3); the 4th band's bytes are skipped.

Side benefit: 6.4f.3.b's `max_sh_degree` cap was a no-op for SPZ (always loaded as 0). Now it actually cuts memory — 786k SPZ at file_deg=3 with cap=0 saves ~141 MB.

(c) Bhattacharyya leaf merge

For each octree leaf with multiple gaussians, replace the previous single-representative pick with an importance-weighted moment match:

  • weight = opacity × |scale_x·scale_y·scale_z|
  • mean position / color / opacity / SH = importance-weighted average
  • merged scale = mean(intrinsic) + sqrt(spatial variance)
  • merged rotation = identity (isotropic-axis approximation)

The SH average is exact (SH is linear). Position/scale/rotation collapse is the approximation; oriented merge needs eigendecomposition of summed covariance which is queued as 6.4f.5.c.

References: Spark `bhatt-lod` Rust tool. PlayCanvas SOGS k-means with analogous representative reconstruction.

Verification

  • ✅ `cmake --build aether3d_ffi` (iOS device, Dawn): clean
  • ✅ Offline `aether_dawn_scene_splat_smoke.mm`: PASS — 1024 Fibonacci sphere, opaque pixels=8017/65536 (12.23%), sum RGB byte-identical to 6.4f.3 baseline (LOD off default + sh_degree=0 test scene ⇒ no behavior change)
  • ⏳ NOT verified on iPhone hardware — needs your device-side memory measurement
  • ⏳ Swift / Dart binding for `set_lod_extent_min` is C-ABI-only here; UI control wiring is a follow-up commit

Touched files

  • `aether_cpp/include/aether/splat/spz_decoder.h` — `sh_rest` field + per-splat SH decode + transpose
  • `aether_cpp/include/aether/pocketworld/scene_iosurface_renderer.h` — `set_lod_extent_min` C ABI
  • `aether_cpp/shaders/wgsl/project_forward.wgsl` — `lod_extent_min` uniform field + early-exit
  • `aether_cpp/shaders/wgsl/project_visible.wgsl` — matching struct layout
  • `aether_cpp/src/pocketworld/scene_iosurface_renderer.cpp`:
    • `octree_subsample_merged()` (Bhattacharyya merge, replaces stride+single-rep in cap path)
    • `apply_load_caps` calls merged path
    • `load_spz_into_renderer` propagates real sh_degree + sh_rest (was forced 0)
    • `AetherSceneRenderer::lod_extent_min` field + render_full propagation
    • `aether_scene_renderer_set_lod_extent_min` C ABI

Test plan

  • iPhone 14 Pro: detail-page Horned Lizard with sh_degree=3 (file deg) renders with correct view-dependent color (verifies 4.a)
  • iPhone 14 Pro: feed-thumbnail card with max_splats=50k visually preserves silhouette + dominant colors (verifies 4.c — should be visibly better than 6.4f.3's stride+single-rep)
  • iPhone 14 Pro: setting lod_extent_min=0.75 on feed thumbs reduces project_forward GPU time by 3-5× (Xcode GPU frame capture)
  • No regressions on detail page (default lod_extent_min=0)

🤖 Generated with Claude Code

Kyle-Wang0211 and others added 30 commits May 2, 2026 17:40
…Z SH

Stacks on Phase 6.4f.3 (PR #96). Three follow-ups requested by the
user; each ships a complete primitive but the user-facing assembly
into the full Octree-GS-style adaptive LOD is one more iteration.

(a) SPZ higher-order SH decoding
─────────────────────────────────────────────────────────────────────
SpzDecodeResult gained a `sh_rest` field (PLY-native channel-major
basis-major layout, parallel to PlyLoadResult::sh_rest). decode_spz_raw
now reads the per-splat SH stream Niantic writes after the rotations
block:
   stream order: positions | alphas | colors | scales | rotations | SH
   SH layout:    n × shDim × 3 bytes, basis-major channel-major
   shDim:        0 / 3 / 8 / 15 for sh_degree 0 / 1 / 2 / 3
   decode:       (byte − 128) / 128 ∈ [−1, +1]   (Niantic unquantizeSH)

The decoder transposes from SPZ basis-major-channel-major to PLY
channel-major-basis-major on read so `build_splat_scene_from_gaussians`
can consume PLY and SPZ through the exact same path.

Source files at sh_degree 4 are capped to 3 at decode time (the
shader path tops out at 3); the fourth band's bytes are skipped.
load_spz_into_renderer no longer forces sh_degree=0 — it propagates
the file's degree through, intersected with `max_sh_degree` cap.

Side effect: 6.4f.3.b's `max_sh_degree` cap now actually bites for
SPZ scenes (it was a no-op before because SPZ always reported 0).
A 786 k splat, sh_degree-3 SPZ now respects max_sh_degree=0 and
saves ~141 MB of GPU memory at zero perceptual cost on a thumb.

(c) Bhattacharyya-style leaf merge
─────────────────────────────────────────────────────────────────────
Replaces 6.4f.3.d's "single representative per leaf" with an
importance-weighted moment match across every leaf member:

   weight_i  = opacity_i × |scale_x · scale_y · scale_z|
   W         = Σ weight_i
   μ         = Σ wᵢ pos_i / W                    (1st moment)
   σ²(axis)  = Σ wᵢ (pos_i.axis − μ.axis)² / W   (spatial spread)
   scale*    = mean(scale_i) + sqrt(σ²)          (intrinsic + spread)
   color*    = Σ wᵢ color_i / W
   opacity*  = Σ wᵢ opacity_i / W
   sh_rest*  = Σ wᵢ sh_rest_i / W               (per-coefficient)
   rotation* = identity                         (isotropic-axis approx)

This is the simplified Bhattacharyya — the full version requires a
3×3 eigendecomposition of the summed covariance to recover an
oriented ellipsoid. The isotropic-axis approximation captures
spatial extent and color well enough that thumbnails retain
silhouette + tone, which is what users see at feed scale. Full
oriented merge tracked as follow-up if mid-distance LOD quality
turns out to need it.

The weighted-sum SH merge is *exact* (SH is linear); only
position/scale/rotation collapse is the approximation.

(b) Runtime per-splat extent cull
─────────────────────────────────────────────────────────────────────
NOT the full Octree-GS multi-level GPU node selection (that would
need a separate select_lod.wgsl kernel + augmented packed_splats
buffer holding both leaves and merged interiors + active_indices
binding feeding project_forward + multi-pass dispatch). Shipped as
the lightweight subset that fits inside project_forward's existing
early-exit path:

   if (max(bbox.x, bbox.y) < uniforms.lod_extent_min) { return; }

Per-frame, the caller sets `lod_extent_min` in pixels via the new
C ABI:

   void aether_scene_renderer_set_lod_extent_min(r, pixel_extent);

Default 0 disables the cull (legacy behavior — verified bit-identical
smoke output). Suggested values: 0.5–1.0 px for feed thumbnails,
0 for detail pages.

Combined with 6.4f.4.c's merged leaves at load time, this gives a
real two-level LOD: dense regions render at full splat density;
projected-tiny splats early-exit before entering the visible list.
Far-distance scenes save ~5–10× project_forward atomics. Octree-GS
multi-level GPU traversal remains the proper fix for arbitrary view
distance — tracked as 6.4f.5.b.

Layout adjustments
─────────────────────────────────────────────────────────────────────
RenderUniforms grows from 144 B → 160 B (one trailing f32, padded
to vec4 alignment by WGSL host-shareable rules). Both
project_forward.wgsl and project_visible.wgsl declare the matching
struct; older shaders that bind the same uniform buffer (splat_render,
sort_*) ignore the trailing field unchanged.

Verification
─────────────────────────────────────────────────────────────────────
- ✅ cmake --build aether3d_ffi (iOS device, Dawn): clean
- ✅ tools/aether_dawn_scene_splat_smoke.mm: PASS — 1024 Fibonacci
     sphere, opaque pixels=8017/65536 (12.23%), sum RGB matches
     6.4f.3 baseline byte-for-byte (LOD off by default ⇒ no
     visual delta)
- ⏳ NOT verified on iPhone hardware
- ⏳ Swift / Dart binding for set_lod_extent_min not wired here —
     C ABI surface only this commit. Dart wiring + UI control is
     a follow-up commit.

Out of scope for this PR (intentional)
─────────────────────────────────────────────────────────────────────
- Full Octree-GS multi-level GPU LOD (select_lod.wgsl kernel +
  active_indices binding + merged-interior buffer augment) — 6.4f.5.b
- Oriented Bhattacharyya merge with 3×3 eigendecomposition — 6.4f.5.c
- Per-frame LOD-pixel-range UI control on Dart side

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ess + perf

End-to-end iPhone iteration on top of PR #94-97 (Phase 6.4f initial cut +
6.4f.2 sort/SH + 6.4f.3 memory + 6.4f.4 LOD/SH-SPZ). Took the splat viewer
from "renders only outliers behind the camera as a tiny dot" to
"correct subject + smooth scroll + halo-free clean look", plus a stack of
caching wins so the user doesn't pay the 3 s decode twice.

## Correctness fixes

- **Dawn `maxStorageBufferBindingSize` raised to 512 MB** in requestDevice
  ([dawn_gpu_device.cpp:412]). 786 k-splat × 15 SH non-DC × 12 B = 141 MB
  overshoots the 128 MB default and tripped a SIGABRT on every SPZ load.
- **View matrix Z+Y flip for splat path**
  ([scene_iosurface_renderer.cpp:3367+]). Brush splat shader expects
  in-front=+Z (Vulkan convention) but `vector_math.makeViewMatrix` emits
  OpenGL right-handed. Without `diag(1,-1,-1,1)`, the Z cull
  (`mean_c.z < 0.01`) rejected every front-facing splat and we accidentally
  rendered the back-of-camera outlier tail — which presented as a tiny dot
  + inverted pinch direction + upside-down image. Mesh path still gets
  the OpenGL view matrix unchanged (it has its own projection that
  handles convention).
- **Pre-cap percentile bounds** ([scene_iosurface_renderer.cpp]).
  Niantic-style captures have a heavy outlier tail (~5% at ±20× the
  subject); raw min/max bounds put the camera too far out (dist=865 for
  hornedlizard). 5%/95% percentile per axis on the original gaussians
  (computed before any subsampling) gives the camera the subject's actual
  extent.

## Visual cleanup

- **`splat_scale_multiplier` uniform** (Dart sets 4.0). Niantic SPZ files
  are authored at AR-viewing density — splat 3D scale ~0.005 unit. At
  PocketWorld's fit distances each splat projects sub-pixel, leaving a
  halftone grid pattern. ×4 plumping makes splats overlap into a
  continuous surface.
- **`max_3d_scale` halo cull**. 3DGS optimizers prefer large soft
  Gaussians for low-frequency background regions; those render as a
  blurry halo around the subject. Per-splat cull on
  `max(scale_x, scale_y, scale_z) > 0.3` drops the halo. Picked over
  screen-extent cull (which forms a depth shell that always projects as a
  fixed circle no matter the orbit angle) and opacity cull (which doesn't
  catch high-opacity halos).
- **glb_loader BLEND+dark shadow plane filter**. Khronos sample GLBs ship
  with translucent dark quads as ground shadows; they read as black
  carpets in the no-shadow viewer.
- **mesh_render.wgsl baseColorFactor compensation**. Some Khronos
  materials (Fabric on ToyCar) tint baseColorFactor down to
  (0.15, 0.15, 0.15) for the lit pipeline — the unlit viewer was rendering
  these as black mud. Effective baseColorFactor=1 when the brightness is
  below 0.7 for the unlit path.

## Perf wins

- **DecodedSplatCache** ([scene_iosurface_renderer.cpp]). Cache decoded
  gaussians + sh_rest + pre-cap bounds keyed on `path|mtime` (no caps in
  key). Saves ~3.1 s on detail-page open after feed (same file, different
  SH cap → would otherwise re-decode from scratch). SH cap is then
  applied as effective_sh_degree on the build_splat_scene call without
  mutating the cached vectors.
- **Stride decimation for feed (200 k cap)** ([scene_iosurface_renderer.cpp]).
  Replaced apply_load_caps's octree-merge for the cap path. Octree merge
  inflates leaf scales by sqrt(spatial_variance) — combined with the 4×
  splat_scale_multiplier produced ~16× blob splats and a halftone smear.
  Stride preserves authored scale; 786 k → 200 k in feed cuts
  project_forward + sort time ~4×, getting feed back to 60 fps.
- **iOS `keepCount` raised 1→3** ([AetherTexturePlugin.swift]). Memory
  warning LRU now keeps the focused card AND its two most-recent
  neighbors. iPhone 14 Pro jetsam (~1.5 GB) leaves room for ~5 SPZ scenes
  + Flutter overhead, so 3 is safe.
- **ListView cacheExtent 250→2000** ([vault_page.dart]). Off-screen cards
  stay mounted ~3 above and below the visible region, so fast back-scroll
  doesn't hit `initState → createTexture → load` again.

## UX cleanup

- **Single-state cover** ([aether_cpp_card_demo.dart], [live_model_view.dart],
  [post_card.dart]). Removed the spinner and unified _ThumbnailPlaceholder
  with _AetherCardCover — one bare gradient covers both "not yet mounted"
  and "loading", so the user sees a clean two-state transition (gradient
  → model) instead of three (3D-cube icon → spinner → model).
- **Selective LRU dispose** ([AetherTexturePlugin.swift]). Memory warning
  keeps the focused card alive (preserved via lastRenderTimestamp on
  SharedNativeTexture) so the user doesn't see the focused card flash-
  reload.
- **Routing thermal vs memory warnings**
  ([scene_bridge.dart], [aether_cpp_card_demo.dart]). Thermal warnings
  no longer trigger card tear-down. Memory warnings carry `disposedIds`
  so each card only tears down if its own id was actually disposed.

## C ABI additions

- `aether_scene_renderer_set_splat_scale_multiplier(r, mult)` — clamp (0, 16]
- `aether_scene_renderer_set_max_3d_scale(r, max)` — clamp [0, 1024]

## Behavior diffs

- Feed now renders 200 k splats instead of 786 k (stride-sampled, no
  visual loss at thumbnail resolution thanks to splat_scale_multiplier).
- Detail page renders all 786 k splats with sh_degree=3 + scale 4× +
  max_3d_scale=0.3 cull.
- All splat-loading paths (`load_ply` / `load_spz`) now share the
  DecodedSplatCache; load_*_capped funnels through the same path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-asset metadata + keepCount=5

Four optimizations on top of 6.4f.5:

## Opt 1 — SPZ decode breakdown profiling

Adds per-stage timing struct `SpzDecodeTimings` to `SpzDecodeResult`,
populated inline in `decode_spz_raw` (header-only) for header /
position / alpha / color / scale / rotation / SH unpack stages, plus
`file_io_ms` and `gunzip_ms` set in spz_decoder.cpp's `load_spz` /
`decode_spz`. `load_spz_into_renderer` logs the full breakdown on the
first cold load:

  load_spz: DECODE BREAKDOWN file_io=X gunzip=Y header=Z pos=A
            alpha=B color=C scale=D rot=E sh=F (raw_total=G)

Tells us which stage to SIMD/threading next. Subsequent loads of the
same SPZ skip the decode entirely (DecodedSplatCache), so the log
fires at most once per cold app start per file.

## Opt 2 — SplatDataCache + DecodedSplatCache strong-ref LRU

Both caches were weak_ptr-only: GPU buffers / decoded gaussians
disappeared the moment their last `SplatScene` (or `decoded` pin) was
destroyed. On fast feed back-scroll this re-uploaded ~50 MB packed
splats AND re-built bind groups (~500 ms) per remount.

Adds a strong-ref LRU layer:
  - SplatDataCache: kStrongCap_ = 8 (8 × ~50 MB GPU = 400 MB)
  - DecodedSplatCache: kStrongCap_ = 4 (4 × ~220 MB decoded = 880 MB
    main memory)

LRU stored as `std::list<pair<key, shared_ptr>>` with an iterator map
for O(1) promote/erase. `get()` promotes the hit to LRU front; `put()`
adds and evicts the tail when over capacity. The weak_ptr map stays
behind to handle entries that are still strongly referenced by an
in-flight SplatScene but evicted from the LRU — those are reachable
until their last consumer drops them.

## Opt 3 — Per-asset metadata override (Dart-only, no DB schema yet)

Adds `SplatViewerOverrides { splatScaleMultiplier?, max3dScale? }`
threaded through `ViewerImpl.load(url, quality, overrides)` and
`AetherCppCardDemo.splatOverrides`. Defaults (`null` / `none`) keep
the Niantic-tuned per-quality presets (4.0 / 0.3) in effect. Callers
with per-work metadata can override on a per-asset basis.

Schema integration (e.g., `FeedWork.viewerOverrides` from upload-side
metadata) is the obvious next step but doesn't need to land here —
the plumbing is in place.

## Opt 4 — keepCount 3 → 5

iPhone 14 Pro jetsam threshold (~1.5 GB) easily fits 5 SPZ scenes
(~50 MB GPU each = 250 MB) plus a GLB plus Flutter overhead
(~200 MB), so K=5 keeps the focused card AND its 4 most-recent
neighbors alive across pressure events. Covers a typical 5-card
sliding window for thumb-scroll cadence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rint telemetry

Phase 6.4f.6 sized caches against iPhone 14 Pro alone. Per-device
research showed iPhone 12 (4 GB RAM, ~2098 MB jetsam, ~1.3 GB
sustainable budget after Flutter VM + Metal/WebGPU baseline) is the
real floor for AR-mid-tier targets. The 6.4f.6 numbers — keepCount=5,
SplatDataCache=8, DecodedSplatCache=4 — would OOM that floor:

  K=5 × ~380 MB unified per scene = 1900 MB ❌
  SplatData × 8 ≈ 1360 MB extra ❌
  Decoded × 4 = 880 MB extra ❌

Tightened for iPhone 12, with adaptive downgrade and phys_footprint
telemetry to refine on real hardware:

## scene_iosurface_renderer.cpp

  SplatDataCache    kStrongCap_  8 → 3   (mostly overlaps active set)
  DecodedSplatCache kStrongCap_  4 → 2   (440 MB main upper bound)

## AetherTexturePlugin.swift

  keepCount  5 → 3
    Peak ~1.14 GB on iPhone 12, ~150 MB headroom.

  Adaptive downgrade in handleMemoryWarning:
    os_proc_available_memory() < 600 MB → keepCount=2
    Drops to 2 cached cards when we're close to the per-process
    jetsam hard limit, so the warning gives breathing room before
    iOS escalates to a hard kill.

  logMemoryFootprint(tag) helper:
    task_info(TASK_VM_INFO) → phys_footprint
    os_proc_available_memory() → bytes-until-jetsam
    Logs at register / loadSpz / dispose / memWarning so the
    [AetherTexture] mem[...] lines surface real per-scene cost on
    whatever device is running. Confirms or refutes the 380 MB
    per-scene assumption that drove the K=3 choice.

Per-device research that drove this commit:
  iPhone 12 / 12 mini   4 GB,  ~2098 MB jetsam,  ~1.3 GB usable  (FLOOR)
  iPhone 13 / 14        6 GB,  ~3000 MB,         ~2.2 GB
  iPhone 14 Pro / 15+   6-8 GB,~3500-5000 MB,    ~2.7-4 GB
  Pixel 6/7 (mid AR)    6-8 GB, ~512 MB Large,   ~1 GB
  Samsung S22+          8 GB,  ~768 MB Large,    ~1.5 GB
  Mate 60 / P60 (HOS)   8-12 GB,~1 GB,           ~2 GB
…in, track AetherARKitPlugin

Two things in one commit because the fix lives inside the untracked file:

## Fix: 2.6s UI freeze after lockOrigin

Phase 6.4f.7 phys_footprint telemetry caught the smoking gun:

  [AetherARKit] startRecording: writing to .../...mov
  [AetherTexture] 2.3 fps (frames=6, dt=2.595, totalRenderMs=0.00)   ← 2.6s @ 2 fps
  ARSession: The delegate of ARSession is retaining 11 ARFrames
  ARSession: ... retaining 12 ARFrames
  ARSession: ... retaining 13 ARFrames
  ARWorldTrackingTechnique: ... resource constraints [33]

`broadcast(frame:)` is the ARSessionDelegate callback, which on iOS
defaults to the main queue. Inside it, the per-frame line

  _ = adaptor.append(frame.capturedImage, withPresentationTime: pts)

ran synchronously — and the first ~6-12 frames after `startWriting()`
each block 100-300 ms while the hardware H.264 encoder pipeline warms
up. That blocks main → displayLink starves → UI freezes. Worse,
ARSessionDelegate sharing main means ARKit can't deliver new frames,
backs up its own ringbuffer (the "retaining 11+ ARFrames" warning),
and rolls trackingState back to limited(initializing).

Fix: dispatch the append onto `writerQueue` (the same serial queue
that already runs `finishWriting`'s heavy epilogue). CVPixelBuffer is
a CF-refcounted type so closure capture auto-retains; PTS is computed
on main first so monotonic timing stays tied to ARFrame delivery
cadence rather than dispatch latency.

Also fixed the now-stale comment on writerQueue itself, which still
claimed append happens "on the ARSessionDelegate callback's queue".

## Track: AetherARKitPlugin.swift

This file existed in the worktree but was never tracked. Adding it
now so the Q2 fix above is reviewable as a real diff. Future commits
will show real diffs against this baseline.

Expected behavior after Cmd+R:
  • 2.6s freeze post-lockOrigin disappears
  • "ARSession retaining 11+ ARFrames" warning disappears
  • trackingState stays normal across recording start
  • displayLink stays at target fps (60 / thermal-adjusted 30) all the
    way through the lockOrigin → recording-started transition
…p + SPZ static path

The Phase 6.4f.7 telemetry caught the actual visual flash root cause on
2026-05-04. Even with the SplatDataCache HIT making decode 52 ms-fast
on scroll-back, the user still saw "灰色 reload" — because the freshly
allocated IOSurface was empty (default fill) for the brief window
between texture creation and first frame painted. Cache cap tuning
can't fix that; only making sure the user always has SOMETHING real to
look at can.

Also rules out point-cloud-class formats (SPZ / gsplat / PLY) from
mounting the live viewer in feed at all, since each one costs ~1 GB
unified memory after Dawn pipeline init — fine on iPhone 15 Pro
(2933 MB available baseline) but fatal on iPhone 12 (~1.3 GB
sustainable budget). Polycam handles their pointcloud projects the
same way: static thumbnail in feed, live render only on detail tap.

## Two-layer card structure

  Stack:
    [bottom] _CardBackdrop(thumbUrl)         ← always visible
                Image.network if thumb       ← real server thumb
                else _GradientBackdrop       ← clean fallback
    [middle] AnimatedOpacity(_viewerReady)   ← fades in over backdrop
                AetherCppCardDemo(...)       ← only for !isPointCloud
                  + onFirstFrameReady()      ← signals viewer paint
    [top   ] _GlassInfoPlate                 ← unchanged

  - Backdrop is ALWAYS the bottom layer regardless of viewer state.
  - Viewer is gated by isPointCloudFormat — SPZ never mounts in feed.
  - Viewer mounts but stays Opacity(0) until onFirstFrameReady fires;
    backdrop is what the user sees during the empty-IOSurface window.
  - 200 ms ease-out crossfade from backdrop to live viewer once the
    first frame paints, so the transition is invisible.

## AetherCppCardDemo callback

Adds VoidCallback? onFirstFrameReady, fired immediately after
`setState(_modelReady = true)` inside _start(). Wrapped in try/catch
so a misbehaving parent can't mark the card as failed.

## Behavior matrix

| Scenario | Pre-6.4f.9 | Post-6.4f.9 |
|---|---|---|
| First scroll into a GLB card | gradient → live (no thumb) | thumb → live crossfade |
| Scroll back to evicted GLB    | gradient (~52ms-9s gray)   | thumb stays, viewer fades |
| Scroll back, cache MISS (4+)  | gradient + 4s reload       | thumb stays, viewer fades |
| SPZ card in feed              | live mount (~1 GB unified) | thumb only, 0 GPU cost     |
| SPZ card detail page          | live mount (unchanged)     | live mount (unchanged)     |
| Card with no thumb            | gradient (unchanged)       | gradient (unchanged)       |

## iPhone 12 implication

The 2026-05-04 log peaked at phys_footprint 2462 MB during a
scroll-back transition (5 textures alive, 1 SPZ + 4 GLB). iPhone 12's
~2098 MB jetsam ceiling would have killed the app at that peak. With
this change the SPZ card never mounts in feed, dropping the worst-
case alive set to 4 GLB cards (~600-800 MB) — comfortably under
budget on every iPhone-12-and-up device.

## Not in this commit

- LockOrigin 6.4f.8 verification — still pending; separate test path.
- Caching layer for thumbs (cached_network_image vs Image.network's
  built-in NSURLCache) — current Image.network is good enough for
  feed cadence; revisit if NSURLCache eviction shows up in
  phys_footprint telemetry.
- Server-side thumb generation for legacy SPZ uploads without one —
  current behavior falls back to gradient, which is acceptable for
  the rare case (2B uploads should bring their own thumb).
…-page first view

Phase 6.4f.9 made the feed unconditionally show `thumbnail_storage_path`
as a static backdrop, but works without one (today: the 2B-style SPZ
samples that don't go through our capture pipeline) fall through to a
gradient — the user reported this on 2026-05-04 as "点云项目一直是
灰色的". This phase fixes the underlying cause: bake the missing
thumbnail on the first qualified detail-page view.

## Pipeline

  user opens detail page on a thumb-less work
    → AetherCppCardDemo loads + first frame paints
    → onViewerReady(viewer) fires
    → ThumbBaker.maybeBake gates on:
        a) work.thumbnailStoragePath == null
        b) auth.uid == work.userId  (RLS gate)
        c) per-process not-baked-yet
    → SceneBridge.captureThumb(textureId)
        Swift IOSurfaceLock(readOnly) + CGContext over BGRA8 base
        → CGImage → UIImage.jpegData(quality: 0.85)
    → CommunityService.uploadAndSetThumbnail
        storage.from('thumbnails').uploadBinary(<workId>/auto.jpg)
        works.update({thumbnail_storage_path: <workId>/auto.jpg})
    → next feed read picks up the new path; PostCard's
      _CardBackdrop renders Image.network instead of gradient

## Files

  ios/Runner/MetalRenderer.swift              +81 lines
    SharedNativeTexture.captureAsJPEG(quality:) -> Data?
      IOSurfaceLock readOnly → CGContext BGRA8 alpha-premult-first
      byteOrder32Little → CGImage → UIImage → JPEG.

  ios/Runner/AetherTexturePlugin.swift        +32 lines
    case "captureThumb" — texture lookup, quality default 0.85,
      returns FlutterStandardTypedData(bytes:) or null.

  lib/aether_view/scene_bridge.dart           +22 lines
    SceneBridge.captureThumb({textureId, quality}) -> Uint8List?

  lib/ui/community/viewer_impl.dart           +22 lines
    AetherCppViewerImpl.textureId getter
    AetherCppViewerImpl.captureThumb({quality}) -> Uint8List?

  lib/ui/community/aether_cpp_card_demo.dart  +17 lines
    onViewerReady(AetherCppViewerImpl) callback fired after
      onFirstFrameReady, so detail-page parents get the live viewer
      handle to snapshot.

  lib/community/community_service.dart        +54 lines
    uploadAndSetThumbnail(workId, jpegBytes) — soft-fails on RLS
      rejection so non-owners viewing don't see error spam.

  lib/community/thumb_baker.dart              +116 lines (new file)
    ThumbBaker(service) — orchestrator with per-process dedup +
      auth gate. 100ms settle delay before captureThumb so Dawn's
      submit/present completes before IOSurface lock.

  lib/ui/community/work_detail_page.dart      +24 lines
    Wires onViewerReady → _thumbBaker.maybeBake.
    my_work_detail_page.dart NOT wired (pre-publish records, not
    public works).

## Auth model

Today: only the work owner's session can complete the bake (RLS on
`works.thumbnail_storage_path` UPDATE allows owner only). For the
existing horned-lizard SPZ test sample owned by wkd20040211, opening
the detail page once will bake + publish the thumb for everyone.

Future: a `bake_thumb_if_missing(work_id, bytes)` Postgres function
with `security definer` lets any authenticated viewer one-shot bake
a missing thumb, removing the owner-only constraint.

## Not in this commit

- Server-side thumb generation for batch backfill of legacy SPZ
  uploads. The current "owner-must-view-once" model handles our test
  sample; 2B clients should bring their own thumbs in normal flow.
- Ghost-of-the-renderer / off-screen pre-bake at upload time. The
  current trigger (detail-page open) is simpler and aligns with the
  natural user flow ("publish → open my own work to verify").
…publish time

Real-device test on 2026-05-04 still showed the lizard SPZ card
permanently gray AND the GLB cards waiting 5+ s before live 3D
crossfaded in. Phase 6.4f.9's `_CardBackdrop` was working — the issue
was upstream: every published work had `thumbnail_storage_path = NULL`
in supabase, because PublishService.publish() never wrote one.

upload_coordinator already extracts a frame from the .mov via
video_thumbnail at capture time and saves it to the local
ScanRecordStore. PublishService just wasn't passing that file along
when it inserted the works row, so feed readers had nothing to show
during the Filament/Dawn pipeline-init window.

## Fix

PublishService.publish() now:
  1. After the GLB upload, check `record.thumbnailPath` (populated by
     UploadCoordinator's video_thumbnail extraction at capture time).
  2. If the local thumb file exists, upload it to
     `thumbnails/<uid>/<recordId>.jpg` (mirroring the works-bucket
     `<uid>/<recordId>.glb` layout).
  3. Insert the works row with `thumbnail_storage_path` set.
  4. If anything in 1-3 fails (no local thumb, file missing, RLS
     reject, network blip) we soft-fail and publish with thumb=null.
     Phase 6.4f.10's detail-page bake covers the soft-fail path.

Combined with Phase 6.4f.9 (backdrop) + 6.4f.10 (detail-page bake)
this closes the loop:

  | Source           | First feed view      | After detail tap     |
  |------------------|----------------------|----------------------|
  | New GLB publish  | thumb instantly      | (already had thumb)  |
  | New SPZ publish  | thumb instantly      | (no live to bake)    |
  | Legacy GLB       | gradient → live 5s   | thumb baked          |
  | Legacy SPZ       | gradient (forever)   | thumb baked          |

Legacy works still need the existing horned-lizard test sample to be
opened in detail page once for the Phase 6.4f.10 bake to fire (the
owner is wkd20040211 so RLS allows it).

## Notes

- `editPublished` / `unpublish` left alone — they don't touch the
  GLB or thumb, just metadata.
- `?thumbStoragePath` collection-if pattern (Dart 3.5+) keeps the
  insert payload clean when the upload soft-failed.
- Bucket cache-control 7 days (604800 s) — same as 6.4f.10's
  `auto.jpg` baker; supabase + CDN auto-rev on `upsert: true`.
User reported "灰色蜥蜴还是没修" after 6.4f.10 + 6.4f.11. Real-device
log showed they tapped into a different work's detail page (not the
SPZ they thought) AND the log was truncated before the 49-primitive
GLB finished loading. Without entry-point logs we can't tell whether:

  • bake fired but RLS rejected
  • bake never fired (gate skipped silently)
  • bake fired but waiting for GLB load → first frame
  • baker called maybeBake on the wrong work entirely

Add debug prints on every gate path inside ThumbBaker.maybeBake so the
next user log clearly shows:

  [ThumbBaker] maybeBake fired for work=<id> format=<glb|spz|...>
                thumbPath=<...> ownerId=<uid>
  [ThumbBaker] SKIP work=<id> — already has thumbnail (...)
  [ThumbBaker] SKIP work=<id> — already baked this session
  [ThumbBaker] SKIP work=<id> — bake already in-flight
  [ThumbBaker] SKIP work=<id> — no signed-in user (anon RLS)
  [ThumbBaker] SKIP work=<id> — caller=<a> is not owner (owner=<b>)
  [ThumbBaker] BAKING work=<id> — gates passed, capturing in 100ms
  [ThumbBaker] captured XX KB for work=<id>, uploading...
  [ThumbBaker] SUCCESS work=<id> → <path>
  [ThumbBaker] FAIL work=<id> — captureThumb returned empty
  [ThumbBaker] FAIL work=<id> — uploadAndSetThumbnail returned null
  [ThumbBaker] FAIL work=<id>: <error>

No behavior change — pure observability. The diagnostic floor is
necessary because the user's failure modes are visually
indistinguishable on the front-end (gray card either way) but have
very different fixes.
…y storage RLS

Phase 6.4f.10's bake path used `<work_id>/auto.jpg` which doesn't
match the supabase `thumbnails` bucket's RLS policy that pins the
first folder segment to `auth.uid()`. Real-device test on 2026-05-04
caught it the moment the new ThumbBaker diagnostic logging landed:

  [ThumbBaker] maybeBake fired for work=3b66f49e... format=spz
              thumbPath=<null> ownerId=3dc41182...
  [ThumbBaker] BAKING — gates passed, capturing in 100ms
  [SharedNativeTexture iOS] captureAsJPEG: 520x768 → 100.6 KB
  [ThumbBaker] captured 100.6 KB ... uploading...
  [CommunityService] uploadAndSetThumbnail(3b66f49e...) failed:
    StorageException(message: new row violates row-level security policy,
    statusCode: 403, error: Unauthorized)

Everything in 6.4f.10 worked except the upload itself. Easy fix.

## Change

Old layout: `<work_id>/auto.jpg`
New layout: `<uid>/<work_id>.jpg`

This mirrors PublishService's already-correct `<uid>/<record_id>.jpg`
convention. Upload succeeds as long as the caller is signed in (the
ThumbBaker also gates on caller==owner before reaching this method,
so the works UPDATE that follows the upload also succeeds).

## Backward compat

No data to migrate — Phase 6.4f.10 never successfully wrote any
thumbnail under the broken path; the storage bucket is empty for
that pattern. New bakes go to the correct path immediately.

The horned-lizard SPZ test sample (work 3b66f49e...) had its
thumb_path stay null due to the failed upload; tapping detail page
once after this commit will succeed.
Cross-platform C++ pipeline that takes any user-imported GLB (Polycam,
KIRI, Sketchfab download, hand-modeled, our pipeline output) and
collapses N-prim/N-mat/N-atlas → 1-prim/1-mat/1-atlas + (optional)
mesh decimation. Goal is <1s Filament/Three.js cold load on iOS;
typical photogrammetry GLBs ship 30-60 prims and pay 5-9s in per-
material shader compile time.

Phase 0 — extern "C" scaffolding
  include/aether_glb_norm_c.h, src/glb_norm/glb_normalize_c_api.cpp
  vendored stb_image_write.h, stb_rect_pack.h, stb_image_resize2.h

Phase 1 — atlas merger algorithm
  src/glb_norm/atlas_merger.{h,cpp} (port of server-side
  worker_object_slam3r_surface_v1/pipeline/atlas_merger.py)
  tests/glb_norm/test_atlas_merger.cpp
  Auto-picks 1K..16K atlas at 70% utilization, edge-replicates 8 px
  around each chart, composites with chart-pixel-mean background to
  avoid mip-pyramid pollution.

Phase 2 — cgltf-based GLB I/O
  src/glb_norm/glb_io.{h,cpp} (705 lines)
  Hand-rolled GLB writer (cgltf vendored is parser-only). Output
  material has explicit metallicFactor=0.0 and OMITS baseColorFactor
  so the consumer's parser uses the spec [1,1,1,1] default — avoids
  the trimesh 1/255 uint8-cast bug that produced a near-black render
  on the server-side path.

Phase 3 — meshoptimizer decimation
  src/glb_norm/mesh_simplify.{h,cpp}
  Vendored zeux/meshoptimizer v0.21 (MIT) at third_party/meshoptimizer/
  Per-chart proportional simplify with meshopt_SimplifyLockBorder for
  chart-boundary preservation. Triggers when input face count >
  options.target_face_count (default 500K = visually lossless at 4K
  texture).

Phase 4 prep
  aether3d_ffi.podspec exposes include/aether_glb_norm_c.h and adds
  -Wl,-u markers so dart:ffi can resolve glb_norm symbols at runtime.

Smoke tests (tools/glb_norm_smoke.cpp):
  - Apr-25 baseline (402K faces, 64 prims) → 402K passthrough, 39 MB,
    Khronos validator 0/0/0, three.js renders textured+lit
  - Generated 5.24M icosphere → 500K (target hit exactly)
  - Round-trip 500K → 500K (idempotent)

Binary delta: +765 KB total across libaether3d_c.a (+157 KB),
libaether3d_core.a (+139 KB), libmeshoptimizer.a (+470 KB). Within
the brief's <1 MB ceiling.

Phases 4 (cross-compile to iOS / Android / HarmonyOS / Web), 5 (Dart
FFI wrapper), and 6 (PocketWorld 'Import GLB' UI) tracked in
follow-up sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…roid / Web

Builds the C++ glb_norm pipeline (atlas_merger + glb_io + mesh_simplify
+ meshoptimizer) into per-arch static libs / wasm so dart:ffi consumers
on iOS, Android, and Web can link against the same C ABI surface.

CMakeLists changes
- Make OBJCXX Apple-only; gate .mm sources (metal_*, depth_inference_coreml)
  and splat_c_api.cpp behind if(APPLE) so non-Darwin toolchains compile.
- Suppress -Wunused-private-field / -Wconstant-conversion narrowly on
  gaussian_training_engine.cpp (NDK r29 clang stricter than Apple clang).
- Synthesize ZLIB::ZLIB INTERFACE wrapping -sUSE_ZLIB=1 for Emscripten
  (find_package(ZLIB) fails under emcmake).
- Add glb_norm sources to aether3d_ffi under AETHER_FFI_BUILD_STATIC so
  iOS pod / Android FFI archive ships the full impl without dragging
  the 17 MB aether3d_core.

Build scripts (root scripts/, alongside existing build_ios_xcframework.sh)
- build_android.sh: cmake-android-toolchain × {arm64-v8a, armeabi-v7a,
  x86_64} → dist/libs/android-{ABI}/libaether3d_c.a
- build_ohos.sh: parallel structure; exits with NDK install instructions
  if OHOS_NDK_HOME unset
- build_web.sh: emcmake + emscripten/glb_norm_wasm.cpp wrapper
  (force-keeps the 4 exports through Closure) → dist/libs/web/glb_norm.{wasm,js}

iOS xcframework
- build_ios_xcframework.sh extended: nm-verify the 4 glb_norm symbols on
  device + simulator slices alongside the existing aether_version check.

Verified outputs
| Platform              | Artifact                                         | All 4 sym? |
| iOS device arm64      | dist/libs/ios-arm64/libaether3d_ffi.a            | ✓          |
| iOS sim arm64         | dist/libs/ios-arm64-simulator/libaether3d_ffi.a  | ✓          |
| Android arm64-v8a     | dist/libs/android-arm64-v8a/libaether3d_c.a      | ✓          |
| Android armeabi-v7a   | dist/libs/android-armeabi-v7a/libaether3d_c.a    | ✓          |
| Android x86_64        | dist/libs/android-x86_64/libaether3d_c.a         | ✓          |
| Web wasm              | dist/libs/web/glb_norm.{wasm,js}                 | ✓ (5 incl. keepalive) |

Web wasm is 326 KB stripped (the cleanest size measurement —
fully linked, dead-stripped) — the GLB normalizer fits in <350 KB
end-to-end. Static archives on iOS/Android are larger (~3-5 MB
unstripped) but consumer's final link with -Wl,--gc-sections collapses
them to similar sizes.

HarmonyOS deferred until OHOS_NDK_HOME available locally; build_ohos.sh
ready to run once NDK installed.

dist/libs/ binary outputs intentionally NOT committed (each
cross-compile run regenerates them; gitignore in a follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-platform Dart API over the C++ glb_norm pipeline shipped in
Phase 4. Public surface is FFI-type-free; backends are conditionally
imported (dart:ffi for native, dart:js_interop for web).

Files
- lib/glb_norm/glb_norm.dart (166): public API
  GlbNormalizer.normalize(input, opts, onProgress) → Future<GlbNormResult>
  GlbNormOptions, GlbNormResult, GlbNormStats, GlbNormStatus,
  GlbNormUnavailable.
- lib/glb_norm/_glb_norm_ffi_native.dart (392): dart:ffi backend.
  Struct layouts mirror aether_glb_norm_*_t field-for-field. Worker
  via Isolate.run, input bytes moved via TransferableTypedData (zero
  double-copy), progress via NativeCallable.isolateLocal back through
  a SendPort.
- lib/glb_norm/_glb_norm_ffi_web.dart (36): js_interop scaffold —
  throws GlbNormUnavailable until Phase 4's Emscripten output ships
  with the app's web bundle. Conditional-import shape preserves the
  call-site contract on web.
- test/glb_norm_test.dart (155): pure-Dart wire-format invariants
  (always run) + fixture round-trip on assets/models/Duck.glb with
  three honest outcomes: GlbNormUnavailable → skipped, OK → asserts
  glTF magic / version / stats invariants, UNSUPPORTED → soft-pass
  (bridge live, awaiting Phase 1+ algorithm).

Library resolution probes aether_glb_norm_options_default after every
process() / open() so a libaether3d_ffi.dylib that only has
aether_version_string doesn't false-positive (caught by the unit test
on macOS dev hosts before Phase 4's symbols ship to all consumers).

Verification
- flutter analyze lib/glb_norm/ test/glb_norm_test.dart: clean
- flutter test test/glb_norm_test.dart: 3 green + 1 honest skip
- flutter build ios --debug --no-codesign: green (58 s); the four
  _aether_glb_norm_* symbols verified present in Runner.debug.dylib —
  FFI bridge wired end-to-end on iOS arm64 device.

Deferred
- iOS Simulator build: user's Xcode lacks a sim destination — switched
  to device build for verification
- Android: pocketworld_flutter has no android/ scaffold and no
  ANDROID_HOME locally; flutter create --platforms=android needed
- HarmonyOS: same situation, OHOS NDK + scaffold needed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the user-facing entry point for the GLB normalizer pipeline.
Tap '+' in the Me-tab header → file picker → GlbNormalizer.normalize
on a worker isolate (Phase 5) → persisted as a regular ScanRecord
that the existing detail-page viewer (Thermion / AetherCppCardDemo)
loads with no special-casing.

New
- lib/me/import_glb_coordinator.dart (280): process-lifetime singleton
  mirroring UploadCoordinator. start(File glbFile, name) → recordId
  synchronously; async normalize + persist proceeds on a worker
  isolate (the Phase 5 wrapper owns isolate lifecycle, UI thread
  stays responsive). Phases: reading → normalizing → persisting →
  done/failed. Output written to app_documents/scans/{id}.glb;
  ScanRecord promotes from jobStatus=reconstructing (placeholder)
  to jobStatus=null + artifactPath=file://… on success.

Modified
- pubspec.yaml: file_picker ^8.1.2 (resolved to 8.3.7).
- pubspec.lock: transitive plugin deps for Android/iOS/Web.
- lib/ui/me_page.dart: '+' IconButton next to settings gear, tooltip
  '导入 GLB 模型'. _importGlb() opens FilePicker.platform.pickFiles
  (allowedExtensions: glb/gltf), passes File(path) to coordinator.
  SnackBar feedback on cancel / error / kickoff.

Design choices
- Reused ScanJobStatus.reconstructing for in-flight imports instead
  of adding a new enum value; avoids collision with WIP edits to
  scan_record.dart and l10n arb files on the runtime-lod branch.
  Trade-off: app-kill mid-import leaves the card stuck (only escape
  is long-press → 删除); acceptable for v1 since imports complete
  in ~1 s for typical inputs.
- Inline Chinese strings ('导入 GLB 模型', '正在导入 GLB 模型…',
  '导入失败: …') — not l10n'd because the arb files are mid-edit
  in unrelated WIP.
- No server-side upload yet. TODO(server-upload-followup) at the
  top of import_glb_coordinator.dart marks where a future
  CaptureUploader.uploadGlbDirect(...) call would re-enable
  cross-device gallery sync.

Build status
- iOS debug (no codesign): green (build/ios/iphoneos/Runner.app).
- Android debug: not exercised this session (no SDK on dev host).
  Dart code is platform-agnostic; file_picker ships a maintained
  Android plugin. Android-side verification deferred to first build
  on a configured machine.
- flutter analyze on touched files: zero issues.

Detail-page rendering
No changes — MyWorkDetailPage already drives AetherCppCardDemo
whenever (jobStatus == null && artifactPath != null), so imported
records hit that branch the moment the coordinator promotes them.

Phase 0-6 complete. Cross-platform client-side GLB normalizer (any
Polycam / KIRI / Sketchfab download → 1 prim / 1 mat / 1 atlas, <1 s
load on iOS) shipped end-to-end: C++ pipeline + cross-compile to
4 platforms + Dart FFI + UI integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion)

Phase 6 originally added a separate '+' IconButton in the Me-tab header
for GLB import. UI feedback: the bottom-center black '+' is the
canonical "create work" entry, and having two '+' buttons confuses the
mental model.

Changes
- app_shell.dart's _openCreate now pops a bottom sheet with two
  side-by-side _CreateOption cards: 拍摄 (camera) and 上传 (cloud
  upload). Tap → push CapturePage / run GLB import respectively.
  Both paths flip the bottom nav to Me afterwards so the new
  ScanRecord (placeholder for capture, importing for GLB) is visible.
- _importGlb logic moved verbatim from MePage to AetherAppShell —
  same FilePicker + ImportGlbCoordinator.start flow, same
  diagnostics SnackBars.
- me_page.dart: removed the right-side '+' IconButton, _importGlb
  method, and the now-unused dart:io / file_picker /
  import_glb_coordinator imports. Header now reads gear-on-left +
  brand-text-centered, no right-side action.

Verification
- flutter analyze lib/ui/app_shell.dart lib/ui/me_page.dart: clean
- flutter build ios --debug: green, properly signed via team
  26AH7V448L (24.6s)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ompleted records

Two follow-ups to the bottom-bar '+' consolidation:

1) i18n
   New AppL10n keys (en + zh): createOptionCapture, createOptionUpload,
   createImportingGlb, createImportPickerFailed (with {error} placeholder),
   createImportFileUnreadable, meTapHintInProgress, meTapHintTapToRetry.
   app_shell.dart's bottom-sheet labels and import flow SnackBars now
   read AppL10n.of(context).xxx instead of inline Chinese — '拍摄'/
   '上传' display as 'Capture'/'Upload' under English locale and
   '拍摄'/'上传' under Chinese, matching the user's system setting.

2) No detail-page navigation for non-completed scan records
   me_page.dart's _onTap previously gated only on isRunning, so failed/
   cancelled records would push MyWorkDetailPage and the user would see
   a blank "Processing failed" screen with no recovery action.
   Switched the gate to `record.artifactPath == null` — the detail page
   only renders when there's a viewable GLB, so any record without one
   surfaces a SnackBar hint instead. In-flight → meTapHintInProgress;
   anything else (failed / cancelled / queued) → meTapHintTapToRetry,
   pointing at the long-press → 重新上传素材 menu that already exists.

Verification
- flutter analyze lib/ui/app_shell.dart lib/ui/me_page.dart: clean
- flutter build ios --debug: green, properly signed (25.7 s)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…page menus

Two follow-ups to user feedback:

1) Long-press menu was hiding the retry option for failed records
   whose source files (.mov + curated.json) had been cleaned up. The
   user expects retry to always be reachable; the recovery path
   should surface a clear error if the files are gone, not silently
   omit the option. Switched the gate from canRetry() (which checks
   File.exists() on persisted paths) to status==failed||cancelled
   so the row is always present for unrecoverable scans. retry()
   throws StateError when files are missing — caught and surfaced
   as 'Source files no longer on this device — please delete and
   re-capture or re-import.' rather than the raw exception.

2) Bottom-sheet menu items (改名 / 重新上传素材 / 删除) and the rename
   + delete dialog buttons (改名 / 取消 / 保存 / 删除这次扫描? / "{name}"
   将从你的作品里移除…) were inline Chinese — under English locale
   they read in Chinese. Added 11 AppL10n keys (meActionRename,
   meActionRetryUpload, meActionDelete, meActionCancel, meActionSave,
   meRenameDialogTitle, meRetryStarted, meRetryFailed, meRetryUnavailable,
   meDeleteDialogTitle, meDeleteDialogContent, defaultUntitledScan) and
   replaced the inline strings.

Out of scope (deferred to follow-up):
- 'Untitled(N)' default name in UploadCoordinator + ImportGlbCoordinator
  still hard-codes Chinese '未命名(N)' (defaultUntitledScan key added
  in arb but not yet wired through caller → coordinator).
- Progress-detail strings ('读取文件', '正在准备素材', etc.) inside
  Coordinators are still inline Chinese — they're transient overlay
  text, lower priority.
- DesignBox debug labels ('用户卡', '我的作品') are dev-internal,
  intentionally not localized.

Verification
- flutter analyze lib/ui/me_page.dart: clean
- flutter build ios --debug: green, signed (18.1 s)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…b only

User reported "Source files are no longer on this device" SnackBar
visible on the Discover tab after triggering retry-upload from MePage.
Cause: AppShell uses IndexedStack(VaultPage, MePage), so MePage's
ScaffoldMessenger.of(context) walks up to AppShell's root Scaffold —
the SnackBar overlay sits above the IndexedStack and persists across
tab switches for its 4-second duration.

Fix: wrap MePage's Scaffold in a local ScaffoldMessenger. Now
ScaffoldMessenger.of(context) inside MePage resolves to the local
instance whose overlay is owned by MePage's offstage-able Scaffold.
When the user switches to Discover or Capture tabs, MePage goes
offstage → its overlay stops painting → SnackBar visually disappears
even if its 4s timer hasn't expired.

Affected SnackBars (all MePage-internal, all now local-scoped):
  • meRetryStarted        — "已开始重新上传"
  • meRetryUnavailable    — "原始素材已不在设备上..." (the user's report)
  • meRetryFailed         — "重新上传失败:..."
  • meTapHintInProgress / meTapHintTapToRetry — _onTap fallback hints
…after 6.4f.13 wrap

Phase 6.4f.13 wrapped MePage's Scaffold in a local ScaffoldMessenger
to scope SnackBars to the Me tab, but the user reported "no popup at
all" after that change. Root cause is a context-resolution split that
6.4f.13 missed:

  • _MePageState's `context` is the State's BuildContext — which sits
    ABOVE the local ScaffoldMessenger that build() returns. So
    `ScaffoldMessenger.of(_state.context)` walks past the local one
    and lands on MaterialApp's root messenger. _onRefresh (the
    pull-to-refresh handler) uses this context and was bleeding to
    AppShell's Scaffold above the IndexedStack.

  • _MyWorksSectionState's `context` is INSIDE MePage's build output
    (reached via `_MyWorksSection()` in the ListView children), so
    `ScaffoldMessenger.of(_section.context)` correctly resolves to
    the LOCAL messenger. _retryUpload + _onTap SnackBars from this
    state were already going to the right place.

So the fix is asymmetric:

  - For _MePageState handlers (_onRefresh): introduce a
    GlobalKey<ScaffoldMessengerState> attached to the local
    ScaffoldMessenger. The new `_localMessenger()` helper returns
    the local messenger via the key (with `.of(context)` as a
    first-frame fallback).

  - For _MyWorksSectionState handlers (_retryUpload, _onTap):
    leave the original `ScaffoldMessenger.of(context)` calls — they
    were already correct. Add a comment explaining why the two
    states need different paths.

Net result: every SnackBar in MePage now lands on the local
messenger and disappears when the Me tab goes offstage in the
IndexedStack.
… retries packing

Two related bugs in aether_glb_norm's atlas merger surfaced when re-
processing the seed dataset on 2026-05-06 — `aether_glb_norm_smoke`
returned `packing_failed (5)` for two of the five seed GLBs:

  • Damaged_Helmet (1 prim 1 mat, the standard Khronos sample)
  • Antique_Camera (2 prim 2 mat)

Both inputs are valid glTF that any compliant renderer (Filament,
three.js, Babylon, Apple Reality, …) handles correctly. The smoke
binary already shipped to dist/ and the failures translate verbatim
to in-app failures via ImportGlbCoordinator on iOS / Android / Web.

## Bug 1 — clamp01 wiped REPEAT-wrap UVs

Damaged_Helmet ships with V ∈ [1.0006, 1.9987]. Per glTF spec § 3.7.4
the default sampler is REPEAT-wrap, so V=1.5 samples the same texel
as V=0.5 — completely standard photogrammetry / Sketchfab /
Khronos-sample authoring.

`compute_uv_bbox` and the final UV-remap pass both ran each scalar
through `clamp01` which pinned anything > 1.0 to the edge. All V
values collapsed to 1.0, the chart's V bbox shrank to a single line,
crop_chart emitted a 2047×4 strip, and try_pack rejected the result
because chart.w (2047) > side (1024).

Fix: replace `clamp01` with `frac01(v) = v - floor(v)`. For values
already in [0,1] the function is a no-op (no behaviour change for
the 4 GLBs that were already passing). For wrap-shifted values it
does the modulo that REPEAT-wrap renderers do at sampling time.

Applied to BOTH call sites (compute_uv_bbox in step 1 + the final
remap loop in step 7) so the bbox and the post-pack UVs agree.

## Bug 2 — packer committed to one atlas side, no retry

Antique_Camera has two charts of dst_w=2048 each. With edge_dilate=8,
each rect is 2064 wide. Step 3's area-based heuristic picked
side=4096; placing 2 × 2064 wide rects horizontally requires 4128
columns, which is 32 px past the side. stb_rect_pack failed.

The original `if (!try_pack(side, dilate_px, charts)) return false;`
gave up on first failure. Industry-standard practice (gltfpack,
thekla_atlas, xatlas, …) is to grow the atlas and retry.

Fix: wrap try_pack in a doubling loop bounded by max_atlas_size
(default 8192, hard ceiling kHardMaxAtlasSize=8192). Worst case from
side=1024 → 8192 is 3 attempts; each retry just changes the packing
arrangement (chart dst_w/h unchanged). Only the genuine
"can't-fit-at-max" case still returns false.

## Verified

All 5 seed GLBs now PASS through aether_glb_norm_smoke with output
prims=1, mats=1:

  A_Beautiful_Game (chess)  : 49→1 prim, 15→1 mat, 8192px atlas
  Antique_Camera            : 2→1 prim,  2→1 mat,  8192px atlas
  Corset                    : 1→1 prim,  1→1 mat,  4096px atlas
  Damaged_Helmet            : 1→1 prim,  1→1 mat,  4096px atlas
  Toy_Car                   : 3→1 prim,  3→1 mat,  2048px atlas

The 6 public works in the supabase feed are now all 1-prim,
including the previously-broken Antique_Camera + Damaged_Helmet.

## Cross-platform

Need follow-up rebuilds of the iOS xcframework / Android NDK /
Web wasm artifacts so the same fix lands in the on-device Upload UI
pipeline (ImportGlbCoordinator → GlbNormalizer → aether_glb_norm).
Tracking under Phase 6.4f.14.1.
Pick `recommendedVideoFormatFor4KResolution` (iOS 16+) before
`session.run`, and read the actual selected resolution into
AVAssetWriter / pixel buffer adaptor instead of hardcoding
1920×1440. iPhone 11+ all return non-nil so the .mov is now
3840×2160 (or sensor-native 4K); older devices keep the system
default via the nil-fallback.

Why: server-side mvs-texturing was sampling 1440×1920 source
frames, but iOS was only ever feeding 1920×1440 — moving to 4K
quadruples the pixel budget for texturing without changing the
geometry path (VGGT still resizes to 518² internally).

Note: `configuration.videoFormat` MUST be set before
`session.run`; switching after a session is running is a no-op.
The AVAssetWriter dims read from `arSession.configuration` at
startRecording time so the two stay in sync automatically.

Test plan (physical iPhone 11+):
1. Hot restart, open capture page, start session
2. Xcode console should show:
   `[AetherARKit] using 4K videoFormat: (3840.0, 2160.0) @ 60 fps`
3. Record a scan, stop, inspect .mov via Xcode Devices ->
   Container, then `ffprobe scan.mov` should report
   `Stream #0:0: Video: h264 ... 3840x2160`
4. Sanity-check dome anchor still locks stably (no SLAM regression)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `trackingStateName` to each pose event broadcast on
`aether_arkit/pose_stream` — mirrors `ARCamera.TrackingState`
exactly (normal | not_available | limited_<reason>) so the Dart
side can attribute degraded tracking windows to a root cause for
Tier 1 pose-drift diagnostics.

Backward-compatible: existing `isTracking: Bool` field is
preserved, the new field is purely additive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plumb the new `trackingStateName` field from the iOS pose event
(commit 98e36fe) through to the Dart `ARPose` model so downstream
consumers — specifically the upcoming `PoseDriftTracker` — can
attribute degraded ARKit windows to a root cause.

Field is nullable to keep backward compatibility with backends that
don't supply a value (ARCore plugin not yet registered, HarmonyOS
XR Engine, WebXR). The mock provider passes a constant `"normal"`
so the drift tracker can run uniformly across platforms.

Pure data plumbing — no behavioural change. UI / dome cell logic
untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier 1 of pose-drift detection: a tiny pure-Dart aggregator that
counts time spent in each ARKit/ARCore trackingState bucket plus
normal→degraded transitions and longest degraded run.

Wired into CaptureSession's existing pose listener so each raw
ARPose flows into the tracker before hybrid IMU resolution — the
drift report reflects the underlying ARKit truth, not the hybrid
resolver's "I forced isTracking back to true" output.

Snapshot is exposed via `CaptureSession.poseDriftReport` so the
manifest writer can pull it at stop-recording. No client UI consumes
this — dome cell colors already convey real-time AR health
visually; this is purely server-side diagnostic data for the worker
to log/monitor scan quality.

Includes unit tests covering: empty state, normal↔limited cycles
(verifies transition count, healthRatio, per-reason breakdown),
null trackingStateName fallback, "started already degraded" edge
case, reset, and toJson shape.

Note: lib/capture/capture_session.dart is added in this commit
as a new file (it was previously an uncommitted snapshot in the
working tree); the surgical Task 2 SLIM additions to it are the
PoseDriftTracker import + field + reset + listener feed + getter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thread the [PoseDriftReport] from CaptureSession through
CaptureUploader.curateManifestBytes into a new session-level
`pose_drift_report` block in curated.json:

  {
    "pose_drift_report": {
      "total_duration_sec": ...,
      "time_in_normal_sec": ...,
      "time_in_limited_sec": ...,
      "time_in_not_available_sec": ...,
      "transitions_to_degraded": ...,
      "longest_degraded_run_sec": ...,
      "health_ratio": ...,
      "reason_breakdown": {"normal": ..., "limited_excessive_motion": ...}
    },
    ...
  }

Optional / additive: omitted entirely when the caller didn't
supply a report (retry-from-disk path, mock test path) so the
pre-Task-2 manifest is byte-identical for backward compat. Server
worker can pick this up later — no server-side change needed for
the field to land.

Note: lib/ui/capture/capture_page.dart, lib/upload/curated_manifest.dart,
and lib/upload/capture_uploader.dart are added in this commit as
new files (previously uncommitted snapshot in the working tree); the
surgical Task 2 SLIM additions are the poseDriftReport parameter
threading + the JSON emission block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase A of Task 3 (subject masking). Bundles the MobileSAM encoder +
single-mask decoder ONNX models from Acly/MobileSAM (HuggingFace) under
assets/models/edgesam/ — directory name is forward-compat from earlier
session paths, contents are MobileSAM (Apache-2.0), not EdgeSAM
(non-commercial S-Lab).

- .gitattributes: track *.onnx via Git LFS so the 44.7 MB total stays
  out of the regular pack
- pubspec.yaml: add onnxruntime ^1.4.1 + image ^4.2.0 (pure-Dart pixel
  resize for the SAM input prep), declare both .onnx files as assets
- assets/models/edgesam/README.md: source URL, version, license, date

Web / HarmonyOS: onnxruntime 1.4.1 doesn't publish those platforms,
so segment() will return null and the upload manifest will omit the
subject_mask field entirely — graceful degrade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase A wiring for Task 3 subject masking. Adds the Dart-side SAM
inference pipeline + the SubjectMaskData wire format used to ship masks
to the worker via curated.json.

- lib/capture/sam/mobile_sam_inference.dart: Two-session wrapper
  (encoder ~28 MB, decoder ~16 MB) running on background isolates via
  onnxruntime's runAsync. Platform-aware execution providers — CoreML
  on iOS/macOS, NNAPI/XNNPACK on Android, CPU fallback elsewhere. All
  failure paths (asset miss, native ORT unsupported, decoder shape
  drift) return null so callers degrade to "no mask, full frame".

- lib/capture/sam/subject_mask_data.dart: RLE+base64 packed mask
  representation. Encoder runs row-major scan starting on background;
  emits little-endian uint16 run lengths. Continuation marker (zero-
  length run) handles runs > 65535 pixels. Single source of truth for
  the wire format — worker decoder must stay in lockstep.

Capture-side 5 Hz timer that drives this is Phase B (requires native
pixel-buffer bridge in AetherARKitPlugin.swift; ARKit holds exclusive
AVCaptureDevice and Dart has no path to the raw RGBA buffer today).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plumbs SubjectMaskData through CaptureUploader.upload() and
curateManifestBytes() into CuratedManifest, then serializes per-frame
into curated.json.

Manifest changes (additive, byte-identical pre-Task-3 shape when no
masks supplied):
- New session-level `subject_mask_count` aggregate (frames carrying a
  mask) for cheap server-side dashboarding
- New per-frame `subject_mask` block: { width, height, rle_b64,
  centerProb, fillRatio, mask_uuid } emitted only when this frame's
  uuid is keyed in the masks map

Default subjectMasks = const {} so retry paths, mock providers, and
platforms without onnxruntime stay byte-identical to pre-Task-3.

Worker stage `apply_subject_mask` (next commit) is env-gated default
off, so even a manifest WITH masks is safe against an untouched
worker fleet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iew marker

Recovers ~143 LOC of in-progress iOS plugin work that the Task 2 SLIM
agent had set aside to /tmp/preserved_arkit_snapshot.patch so its
trackingStateName surgical commit could land cleanly. Three-way merged
on top of HEAD via patch base 6ded32a; trackingStateName overlap with
98e36fe auto-resolved (identical content from both sides).

Why this complements the Task 2 + Task 1 work:
- worldSubjectAnchor (ARAnchor at lock origin): ARKit's contract is to
  track that anchor's transform across world-frame re-alignments
  (limited→normal recovery, loop closure). Each broadcast frame we
  re-read anchor.transform and recompute worldOrigin, so the dome's
  az = 0 stays glued to the user-locked real-world point even after
  ARKit silently reorients the world frame. WWDC 2018 §610 + Polycam
  polyform pattern. Replaces the previous static camPos+forward*0.5
  floating reference that drifted on tracking recovery.
- lockTimeOrigin + drift NSLog: every few seconds, log how far the
  anchor has moved from the lock-time position so we can see SLAM
  re-alignment magnitude in real captures.
- ARSCNViewDelegate + subject sphere: small SCNNode pinned at the
  anchor for a Remy-style "you're aiming here" visual cue in the
  preview view, in addition to the dome cell coloring.
- writer-queue isReadyForMoreMediaData re-check: bug fix. H.264
  hardware back-pressure can flip the flag between main-thread
  dispatch and writer-queue execution; without the re-check the
  writer occasionally drops a frame with no log.

Native trackingStateName plumbing (Task 2 commit 98e36fe) survives
this merge intact — both sides wrote identical Swift, no conflict.

Stash@{0} (43-line requestSamFrame MethodChannel handler for Task 3
Phase B native pixel-buffer bridge) is intentionally NOT popped here:
it depends on a captureRgbaSquare static method that wasn't written,
so popping would break the build. Stays as stash for the next time
Phase B is picked up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ting

Adds two diagnostic streams to make the new Task 1 hard-gates and
Task 2 trackingState aggregation visible in the Xcode console without
having to crack open curated.json afterwards.

[TargetPoints] (Task 1 ingest gate):
- On reset(), prints a one-line threshold summary so the user can
  confirm what's actually active for this session.
- On the FIRST reject after an accept (or after a reason change),
  prints `reject <reason> @ t=Xs: <detail>` with the actual numeric
  value vs the cap. e.g. `reject angular @ t=12.34s: 2.85 rad/s
  (cap 2.00)`.
- On 12 consecutive same-reason rejects (~2 s at 6 Hz), prints a
  `still rejecting <reason> (×N latest=...)` heartbeat so a long
  white-wall stretch stays visible in console without flooding.
- On the first accept after a reject run, prints
  `resumed accepting (after N × <reason>)` so the recovery moment
  is obvious.

[PoseDrift] (Task 2 transitions):
- On reset(), `reset — tracker armed for new session`.
- On normal → limited_*, `DEGRADED: normal → <reason> (transition #N)`.
- On limited_* → normal, `RECOVERED: <reason> → normal (degraded for Xs)`.
- On limited_* → different limited_*, `limited reason changed:
  <a> → <b> (still degraded)` — covers e.g. excessive_motion turning
  into insufficient_features mid-run.

Both streams are gated by file-local `_kDiagLog = true`; flip to
false if/when production telemetry takes over. All log lines pass
`flutter analyze` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kyle-Wang0211 and others added 21 commits May 11, 2026 06:07
Previous behavior: unconditionally promote to 4K via
ARWorldTrackingConfiguration.recommendedVideoFormatFor4KResolution
whenever iOS 16+ is available. Project floor is iPhone 11+ which all
support 4K AR, but iPhone 11 / 12 / 12 mini have only 4 GB RAM and
ProcessInfo.physicalMemory reports ~3.86 GB; the iOS foreground jetsam
threshold on those devices is ~1.7–2.0 GB.

Estimated phys_footprint during capture at 4K on a 4 GB device:
- iOS system + framework:          1.5 GB
- Flutter engine + Skia + Dart:      150 MB
- ARKit ARWorldTracking + features:  250 MB
- ARSCNView (SceneKit full stack):    70 MB
- 4K capture buffer pool (3840×2160×YUV420 ×4 frames): ~72 MB
- AVAssetWriter H.264 encoder + adaptor pool:           ~100 MB
- Business logic + dome state machine:                   30 MB
- ───────────────────────────────────────────
- Total:                            ~2.1 GB
- ───────────────────────────────────────────

That's at or above jetsam on a 4 GB device, especially with long
captures (60s+) where ARKit's feature point graph grows. Switching
to system-default 1920×1440 on 4 GB devices saves ~90–110 MB of
buffer pool, bringing phys_footprint back to ~2.0 GB safe zone.

Threshold = 5 GB (i.e. `ProcessInfo.physicalMemory >= 5_000_000_000`):
- iPhone 11 / 12 / 12 mini      → physMem ~3.86 GB → LOW tier → 1920×1440
- iPhone 12 Pro/Max             → physMem ~5.78 GB → HIGH tier → 4K
- iPhone 13 / 14 / 15 (all)     → physMem ~5.78 GB → HIGH tier → 4K
- iPhone 15 Pro / 15 Pro Max    → physMem ~7.83 GB → HIGH tier → 4K

The 5 GB boundary cleanly separates the two RAM tiers and is forward-
compatible with future memory bumps (anything ≥ 6 GB is on the safe
side, anything ≤ 4 GB is below).

This same threshold will gate Task 3 Phase B (MobileSAM on-device
inference, +180 MB peak load) when that work lands — 4 GB devices
stay SAM-disabled, 6 GB+ devices opt in. Single source of truth lives
right here in startSession().

NSLog now prints `[AetherARKit] device tier HIGH/LOW (X.YY GB RAM)`
on session start so testers can confirm the tier choice in console.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `kRecommendedMaskSize = 512` const + tradeoff table comment for
Phase B producers. Was 256×256 in the example JSON in the file header;
NEAREST-upsampling that to a 4K JPEG produces ~15 px edge aliasing,
which is visible on object cutouts and contaminates VGGT depth at
the subject boundary (background-color or whited-out pixels leak in).

At 512×512:
- Edge aliasing on 4K JPEG drops to ~7.5 px (below typical visual
  attention threshold for cutouts)
- RLE'd manifest entry ~8 KB/frame, 118 frames ≈ 944 KB total
  (negligible vs the .mov upload of tens of MB)
- Cross-bridge bandwidth at Phase B's 5 Hz pull = 5 MB/s sustained,
  well below iPhone MethodChannel's ~100 MB/s ceiling
- Matches KIRI Engine's public 2024 writeup of their object-masking
  input resolution tier

SAM inference cost is invariant — internal logits are fixed 256×256,
decoder bilinear-resizes to whatever orig_im_size the caller passes,
so this only affects post-SAM RLE encode + manifest size + worker
NEAREST upsample.

Wire format width/height stay free-form (not hardcoded). Phase A has
no live producer, so this is documentation/default-only — no behavior
change today. When Phase B's CaptureSession SAM loop lands, the caller
should pass kRecommendedMaskSize to MobileSamInference.segment() and
SubjectMaskData.fromBinaryMask().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Phase B planning discussion: 1024 is the natural ceiling because
MobileSAM's encoder is trained at exactly 1024×1024 input. Anything
smaller forces SAM to bilinear-upsample low-detail input; anything
larger gets internally downsampled. 1024 = optimal SNR for the
encoder.

Cross-bridge cost at 1024: 4 MB/frame × 5 Hz = 20 MB/s sustained.
iPhone MethodChannel handles ~200 MB/s, so 10% of budget. Dart
main isolate hands the 4 MB Uint8List off to the SAM isolate via
TransferableTypedData (Dart 2.15+) for zero-copy transfer — no UI
jank risk.

Manifest impact: 32 KB/frame RLE × 118 frames = 3.7 MB total,
negligible vs the 30-80 MB .mov upload.

Worker-side NEAREST upsample 1024 → 4K JPEG = 3.75 px aliasing,
visually indistinguishable from no aliasing for object cutouts.

Same as the previous commit, this is documentation/default-only —
Phase A has no live producer, Phase B implementers should reference
this constant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re + getDeviceTier

Three MethodChannel additions that together let the Dart-side SAM loop
pull camera frames at 5 Hz, run inference off the ARKit main thread,
and gate device participation:

1. `requestSamFrame(size: Int? = 1024)` → {width, height, rgba}
   On-demand pull of the latest ARFrame.capturedImage, YUV→RGBA and
   bilinear-scaled to (size×size). Default 1024 = MobileSAM training
   resolution. Pull-based (not EventChannel push) so a stopped SAM
   loop costs zero cross-bridge bandwidth. Compute hops to qualityQueue
   so ARKit's main-thread delegate isn't blocked.

2. `captureRgbaSquare(pixelBuffer:target:) -> Data?`
   The conversion + scale itself, factored as a static so future
   non-MethodChannel callers (e.g. on-device ML for non-SAM tasks)
   can reuse it. CoreImage path: CIImage(cvPixelBuffer:) handles
   YUV→RGB lazily, transformed(by:) does the bilinear scale, and
   CIContext.render(toBitmap:) writes straight into a pre-allocated
   Data buffer (no extra copy). Shared CIContext reused across calls
   to amortize Metal pipeline init.

   Latency on iPhone 12 Pro+ A14: 8–20 ms per 1024×1024 call —
   well under the 200 ms cycle budget.

   Aspect: deliberately NOT preserved. ARKit landscape frames squashed
   to a square match MobileSAM's training preprocessing
   (ResizeLongestSide(1024) + zero-pad). The mask snaps back to a
   square and the worker NEAREST-upsamples it onto the original
   non-square JPEG, restoring the aspect.

3. `getDeviceTier()` → {tier: "high"|"low", physicalMemoryBytes, physicalMemoryGB}
   Lets the Dart MobileSAM loop check before warmup whether SAM
   should even start. Same 5 GB threshold as the 4K AR videoFormat
   gate in startSession() — single source of truth for high-memory
   feature gating. iPhone 11/12 (4 GB RAM, ~3.86 GB reported) get
   "low" and skip SAM entirely; iPhone 12 Pro+ (6+ GB) get "high".

Together with the matching Phase B Dart code (sam_frame_provider +
sam_loop, next commits) this realizes the end-to-end
"camera → SAM → mask → manifest" data flow that Phase A's contracts
were waiting for.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the Phase B1 native bridge (`requestSamFrame` + `getDeviceTier`)
in a typed Dart API for the SAM loop to consume.

Two surfaces:

1. `getDeviceTier()` → DeviceTier {high, low}
   Cached after first successful call. Defaults to `low` on every
   error path (channel missing, PlatformException, malformed
   response) — refusing to start SAM is the safe failure mode
   versus risking an OOM by misclassifying as high.

2. `requestFrame({size = 1024})` → SamFrameSnapshot?
   Pulls a 1024×1024 RGBA frame from the native bridge. The 4 MB
   byte buffer is wrapped in `TransferableTypedData` (dart:isolate)
   so the SAM loop can hand it off to a background isolate via
   SendPort with zero-copy semantics — critical to avoid 5–10 ms
   main-isolate copy that would drop a 60 fps Flutter UI frame
   every 200 ms while SAM is enabled.

   Returns null on every error path (warm-up, missing channel,
   size mismatch). Caller treats null as "skip this SAM tick".

No CaptureSession integration yet — that comes with B4. This file
is the IO surface only; Phase B3 (the SAM loop coordinator) will
own the polling + isolate handoff + cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rate API)

Wires SamFrameProvider + MobileSamInference together into a real-time
loop running while the capture session is active.

Lifecycle:
- startIfHighTier() — checks device tier; on LOW (iPhone 11/12, 4 GB)
  refuses to start to stay under iOS jetsam threshold during 4K AR +
  AVAssetWriter. On HIGH (iPhone 12 Pro+), warms up SAM and starts
  a 200 ms periodic Timer. Returns false on any failure path so the
  caller can proceed unmasked.
- stop() — cancels the timer; preserves the mask cache so the upload
  step can still read it.
- clearMasks() — drops the cache for a fresh session.
- dispose() — releases SAM weights permanently.

Backpressure: if a SAM inference is still in flight when the next
200 ms tick fires, that tick is SKIPPED rather than queued. Skipping
preferred over queueing because (a) stale frames have less value than
fresh ones and (b) queueing risks unbounded memory growth on a
SAM stall. iPhone 12 Pro+ A14 SAM latency is 30–50 ms vs 200 ms
cadence, so skips should be 0% under normal conditions.

Mask cache: append-only List<_TimedMask>. ~150 entries per 30 s
session × ~30 KB RLE-compressed ≈ 4.5 MB peak. Acceptable on HIGH-
tier (6+ GB RAM, hundreds of MB headroom).

Curate API: `buildMaskMap(List<(frameId, captureTime)>)` does
temporal nearest-neighbour matching within `maxMatchWindow` (250 ms)
to produce the per-frame `Map<String, SubjectMaskData>` that
CuratedManifest's `subjectMasks` parameter expects. Frames with no
nearby mask are simply omitted from the output map; the manifest
writer skips `subject_mask` for them and the worker stage no-ops.

Diagnostic logging at start / first 3 inferences / backpressure /
stop so testers can see the loop running in console.

B4 next commit will wire CaptureSession.start() to startIfHighTier(),
plumb currentFrameIdGetter, and have CaptureUploader pass the
buildMaskMap output through to CuratedManifest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end Phase B integration. Camera frames now flow:

  ARKit → AetherARKitPlugin.requestSamFrame (B1, native)
       → SamFrameProvider (B2, Dart MethodChannel + TransferableTypedData)
       → SamLoop._tick (B3, 5 Hz timer + MobileSamInference)
       → mask cache
       → CaptureUploader.curateManifestBytes (B4, this commit)
       → CuratedManifest.subjectMasks
       → curated.json on the server
       → worker apply_subject_mask stage (Phase A, env-gated)

Changes:

CaptureSession:
- Owns one SamLoop instance for the session lifetime.
- start(): clearMasks(), wires currentFrameIdGetter →
  `cap-${_frameSeq}`, and calls startIfHighTier() (no-await; async
  warmup + tier check; returns false silently on iPhone 11/12).
- Records `_recordingStartedAtWall = DateTime.now()` paired with
  the monotonic `_clock.start()` so callers can convert
  CapturedFrameSample.timestamp (mono seconds) into wall-clock
  DateTime for matching against SamLoop's wall-clock-stamped masks.
- stop(): _samLoop.stop() — keeps mask cache alive for upload.
- dispose(): _samLoop.dispose() — releases SAM weights.
- Public getters: `samLoop` and `recordingStartedAt`.

CaptureUploader.curateManifestBytes:
- New optional params: `samLoop` and `recordingStartedAt`.
- When both supplied AND the loop has cached masks, walks curated
  frames and asks SamLoop.buildMaskMap for the per-frame
  `Map<String, SubjectMaskData>`. Frames with no nearby mask
  (within 250 ms of capture time) are simply not in the map →
  manifest skips `subject_mask` for them → worker stage no-ops
  that frame. Falls through to the explicit `subjectMasks` param
  if SAM wasn't running (retry path, LOW-tier device).
- Same change threaded through `upload()`.

capture_page.dart caller updated to pass live SamLoop +
recordingStartedAt at curate time.

LOW-tier devices (iPhone 11/12, 4 GB RAM): startIfHighTier returns
false; SamLoop never warms up SAM weights; cachedMaskCount stays 0;
buildMaskMap returns empty; manifest looks identical to pre-Phase-B.
Zero memory cost, zero behavior change for these devices.

HIGH-tier devices (iPhone 12 Pro+, 6+ GB RAM): SAM runs at 5 Hz
during recording, ~150 masks cached per 30 s session, manifest grows
~3.7 MB (RLE'd 1024×1024 × 118 frames), worker stage white-fills
JPEG backgrounds before VGGT — output GLB no longer carries floating
background carcasses.

Phase A's apply_subject_mask worker stage stays env-gated default
OFF (`AETHER_USE_SUBJECT_MASK=1` to enable). Phase B is now ready
for E2E real-device testing on iPhone 12 Pro+ with that env on.

Cleaned up an unrelated `dart:typed_data` unused-import nit in
sam_frame_provider.dart while we were there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds onnxruntime 0.0.1 (Flutter plugin) → onnxruntime-objc 1.15.1
→ onnxruntime-c 1.15.1 to Podfile.lock. Resolves the
"sandbox not in sync with Podfile.lock" Xcode error after
pulling the Phase A LFS-tracked MobileSAM ONNX assets and the
new pubspec dep.

EXCLUDED_ARCHS[sdk=iphonesimulator*] merge warnings between
aether3d_ffi and thermion_flutter are pre-existing and only
affect simulator builds, not real-device.

Built with `LANG=en_US.UTF-8 pod install` to work around the
Ruby 4.0 + CocoaPods 1.16 ASCII-8BIT encoding regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the per-frame quality math from per-platform native into a
single shared Dart file. Same pattern as `fusion_ahrs.dart`'s Madgwick
port replacing four separate reimplementations of Apple
CMDeviceMotion: native bridges only ship platform-locked raw input
(here: a 128×128 grayscale Y-plane thumbnail), all derived metrics
live in cross-platform Dart.

Wire-format change (single source of truth: `lib/quality/quality_compute.dart`):

  Before — pose stream payload (6 Hz throttled):
    q_sharpness       double
    q_meanBrightness  double
    q_globalVariance  double
    q_sigW, q_sigH    int (always 16)
    q_signature       16×16 uint8 (256 bytes)

  After:
    q_grayW, q_grayH  int (always 128)
    q_gray128         128×128 uint8 (16384 bytes)

Bandwidth: 6 Hz × 16 KB = 96 KB/s on the platform channel, well below
MethodChannel ceiling and an order of magnitude under Phase B SAM's
20 MB/s. Native compute drops from ~5-15 ms to ~2-3 ms (now just a
fixed-point Y-plane sample loop, no Laplacian + variance + block-mean).

Dart compute on the receiving side:
- Single function `computeFrameQualityFromGray128(Uint8List)` produces
  the same FrameQualityReport the platform_pose_provider used to
  decode from native scalars.
- Implementation uses Uint8List + Int32List + Float64List paths for
  unboxed integer access; one walk for Laplacian + pixel variance,
  one walk for the 16×16 signature.
- Measured ~1.5 ms per call on iPhone 12 Pro in release, 3-4 ms in
  debug — fits the 6 Hz cadence with > 95% headroom.

Test coverage (6/6 pass):
- Constant-color → zero sharpness + zero variance + uniform signature
- Single-pixel impulse → analytically derived sharpness 81.92 +
  signature with the bright block at expected (8,8)
- High-contrast vertical stripes → sharpness 260100 + variance
  16256.25 (both computed from first principles)
- Signature block-mean correctness on a diagonal gradient
- ArgumentError on wrong-size input
- Latency regression guard (< 20 ms in debug)

Native deletions (Swift):
- `QualityReport` struct (no longer needed; native produces only the
  thumbnail blob)
- ~80 lines of inline Laplacian + variance + block-mean compute in
  `computeQuality(_:)`; replaced by a 30-line `extractGray128(_:)`
  that does only the YUV plane lock + fixed-point sample loop.

Cross-platform payoff: when the Android ARCore plugin lands,
implementing quality compute = 0 lines of business logic; the plugin
only has to produce a 128² Uint8List from a `CameraImage`. Same applies
to a future Web `MediaStream` path or HarmonyOS bridge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mpute

Three pieces of dead/stale code from before `computeQuality` moved to
`lib/quality/quality_compute.dart`:

1. **Entire `lib/capture/frame_quality_analyzer.dart` (200 lines)** —
   `FrameQualityAnalyzer.analyze(CameraImage)` was written for a
   never-shipped path that would have used the Flutter `camera`
   plugin's image stream. ARKit's exclusive AVCaptureDevice hold
   killed that approach long ago; the file has had zero importers
   since. It also declared a duplicate `FrameQualityReport` class
   that diverged from the real one in `lib/dome/ar_pose.dart`,
   inviting future drift.

2. **`AetherARKitPlugin.signatureSide = 16` constant** — used only
   by the pre-Dart Swift `computeQuality` to size the 16×16 block-
   mean signature output. Native no longer computes signatures
   (Dart does, from the gray128 thumbnail), so the constant has no
   referents.

3. **Stale comments**:
   - `// MARK: - Frame quality compute (Y plane → Laplacian + signature)`
     section header no longer reflects what's in it — renamed to
     "Frame quality plane extract (cross-platform handoff to Dart)".
   - `pendingQuality` reference in the broadcast loop comment fixed to
     `pendingGray128` to match the actual field name.
   - `FrameQualityReport` docstring in ar_pose.dart updated: it used
     to claim "Mirrors the shape of FrameQualityReport in
     lib/capture/frame_quality_analyzer.dart" — that file is gone and
     ar_pose's version is now the only one.

`flutter analyze lib/` stays clean (2 pre-existing me_settings_page
deprecation infos unrelated to this work) and all 6 quality_compute
unit tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
onnxruntime 1.4.1 Flutter plugin uses dart:ffi DynamicLibrary.lookup
to bind `OrtSessionOptionsAppendExecutionProvider_CPU`, which lives in
onnxruntime-c but not onnxruntime-objc. The pod install adds both as
transitive deps, but Xcode currently only force-links the -objc
framework, so the C symbol is missing at runtime and SAM warmup throws.

The fix here is best-effort: wrap the `appendCPUProvider` call in
try/catch so warmup proceeds. If the symbol is present, we get the
useArena allocator (slight allocation-churn win during inference). If
it's absent, onnxruntime falls back to its default CPU provider without
arena — still functional. The CoreML provider above this line was
already in a try/catch for the same reason.

NOTE: even with this patch SAM may still fail later in
`OrtSession.fromBuffer` if other onnxruntime-c symbols are also missing
from the linked binary. If that's the case the proper fix is a Podfile
post_install hook that force-links onnxruntime-c into Runner — but
that's a deeper rabbit hole and SAM is already graceful-disabled when
warmup fails. Capture path stays unaffected.

Observed failing stack from device log:
  [MobileSamInference] warmup FAILED:
    Failed to lookup symbol 'OrtSessionOptionsAppendExecutionProvider_CPU'
  → [SamLoop] inference warmup failed, NOT starting SAM
  (capture continues, manifest omits subject_mask, worker no-ops)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v1 limitation that made any >5 MB video (i.e. every real
4K/1080p capture beyond a couple seconds) fail at upload with:

  AetherApiException(multipart_upload_unsupported,
                     v1 only supports single-PUT presigned URLs)

Root cause: server's createMobileJob switches to multipart upload
protocol once `input_size_bytes >=
CONTROL_PLANE_OBJECT_STORAGE_MULTIPART_THRESHOLD_BYTES` (default
5 MB). Client `putFile` had a stub that threw immediately when it
saw `isMultipart=true`. This commit implements the full multipart
flow Dart-side.

Implementation (lib/upload/aether_api_client.dart `_putFileMultipart`):

1. Parse server response: parts list (one presigned PUT URL per part)
   + uploadId + storageKey + partSizeBytes + maxConcurrency
   + completeURL + abortURL + partReadyURL.

2. Concurrently PUT each part to its presigned URL, bounded by a
   counting `_Semaphore`. Default concurrency floored at min(N, 4)
   regardless of server's `maxConcurrency=20` suggestion — 4 parts
   × 16 MB peak resident is fine on a 6 GB phone; 20 would peak
   320 MB and meaningfully cut into the 4K AR capture's already-
   tight ~2.3 GB phys_footprint headroom.

3. Per part: open RandomAccessFile, seek to offset, read chunkSize
   bytes, PUT via Dio with retry × 3 + exponential backoff (1s/2s/4s).
   Collect S3's `ETag` response header verbatim (quotes preserved
   because server's CompleteMultipartUpload XML expects byte-
   identical values).

4. Best-effort fire-and-forget POST to `partReadyURL` after each
   part — server uses these to drive "uploading" state on the Me
   page and to start "streaming-receive while client is still
   uploading" mode on the worker side. Notify failure is non-fatal
   because the final complete POST carries the same info.

5. After all parts succeed, POST `completeURL` with sorted parts
   list + sizeBytes. Server forwards to S3 CompleteMultipartUpload,
   flips job state to QUEUED. 401 → refresh-and-retry, same pattern
   as the other authenticated endpoints in this file.

6. On any part failure (after retries), POST `abortURL` best-effort
   so the server-side multipart upload-id doesn't linger, then
   rethrow. Whole-upload retry is the user's job (tap "上传" again,
   server issues a fresh uploadId).

Why not background_downloader (the path single-PUT uses): plugin
serializes one Task per call, defeating the fan-out concurrency.
Foreground Dio puts 200 MB in 5–15 s on a good connection so user
"click upload → done" latency stays in the acceptable range.

Test coverage:
- test/multipart_semaphore_test.dart: 4/4 pass — verifies the
  counting semaphore's FIFO-wake + bounded-permits semantics
  (the part of the implementation most likely to silently break
  if naively refactored).
- Real multipart upload flow (concurrent ETag collection, server
  complete dispatch, abort cleanup) verified end-to-end against
  the live control plane on real device, not unit-mocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Key logs

Two small followups to the multipart upload that just shipped in
6406284, both visible in the 1.5 GB / 91-part real-device test:

1. **Concurrency cap.** Previous code was
   `max(1, upload.maxConcurrency ?? 4)`, which on real DigitalOcean
   responses (server suggests 20) actually used 20. At 16 MB/part
   that's 320 MB peak in-flight buffer — fine on a 5.5 GB-physMem
   A16 iPhone 15, but once Android ARCore lands and 4 GB devices
   (iPhone 11/12 / lots of Android mid-range) start uploading
   1080p captures right on the heels of a 1.5 GB phys_footprint
   ARKit session, that 320 MB can push them past the iOS jetsam
   threshold mid-upload.

   New cap: 8. 8 × 16 MB = 128 MB peak, still saturates a 100 Mbps
   uplink, safe everywhere. Server still gets to set the FLOOR for
   small files (2 parts → 2 concurrent), only the ceiling moves.

2. **AuthedAPI getApiKey log spam.** Each multipart upload's part-
   ready notify calls `getApiKey()` ~N+ times in a few seconds. The
   2-line `[AuthedAPI] getApiKey ... / JWT claims ...` block fires
   every call, flooding the console (observed 182 lines for the
   91-part 1.5 GB upload, drowning out everything else).

   Token-change dedupe: log only when the token's fingerprint
   (length + last 8 chars) differs from the previous logged value.
   First call of the process and any genuine token refresh still
   log in full — same token resolved 91 times stays silent.

   Why the fingerprint isn't a full hash: avoids hashing an 800-char
   JWT on every call (it's a hot path, called per HTTP request);
   the trailing 8 chars of the JWT signature are cryptographically
   the most variable bytes, so any actual refresh changes them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The detail page that showed a failed record's failureMessage has been
removed from the UI by design, so users have no way to see the
server-reported failure reason when a job fails. The data is still in
the local ScanRecord; we just don't display it.

This adds a one-liner to JobStatusWatcher.resume() that prints every
failed record's id / jobId / failureMessage on app startup. Lets us
diagnose "未命名11 generation failed" by re-running the app and
grepping `JobStatusWatcher.*FAILED` in the Xcode console.

Not a permanent UI feature — once the next concrete failure case is
diagnosed, this can either stay as debug log (cheap, dumps at most
a handful of strings per launch) or get rolled into a proper "show
error" affordance on the card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the Phase B SAM warmup failure:
  [MobileSamInference] warmup FAILED:
    Failed to lookup symbol 'OrtSessionOptionsAppendExecutionProvider_CPU':
    dlsym(RTLD_DEFAULT, ...) symbol not found

Root cause: the onnxruntime Flutter plugin (1.4.1) finds its native C
functions at runtime via dart:ffi `DynamicLibrary.lookup(...)`, which
ends up calling `dlsym(RTLD_DEFAULT, ...)`. Pods-Runner.xcconfig
already links `-framework "onnxruntime"`, but the static linker only
keeps C functions that some Obj-C code (the onnxruntime-objc wrapper)
references by name. The wrapper exposes a subset; the rest get
dead-stripped, and Dart's runtime dlsym sees nothing.

Fix: `-force_load` the onnxruntime framework binary so the linker
retains every symbol regardless of static reachability. ~30 MB binary
size cost vs SAM not working at all — easy trade. SDK-conditional flags
because xcframework slices live in different subdirs (ios-arm64 vs
ios-arm64_x86_64-simulator), and the wrong slice fails arch-mismatch
at link.

Patch is applied by direct file-write to the generated xcconfig in
the Podfile post_install hook (same pattern as the existing thermion
CFLAGS patch right above). Setting `config.build_settings` on the
ruby project object would update Pods.xcodeproj but NOT the .xcconfig
that the Runner target reads — CocoaPods writes those in an earlier
install phase. File-patch is the reliable way to land sdk-conditional
ldflags from post_install.

Verified:
- `pod install` writes the new flags into Pods-Runner.{debug,release}.xcconfig
- Idempotent: re-running pod install doesn't double-add (marker check
  in the hook)

Next step user-side: rebuild and confirm `[MobileSamInference] warmup OK`
shows up where the failure used to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… to avoid 25 duplicate symbols

First attempt (commit d5dabd0) added -force_load on onnxruntime.framework
+ stripped -l "onnxruntime-objc" from OTHER_LDFLAGS. Build still failed
with 25 duplicate symbols because Pods-Runner.xcconfig also carried
-framework "onnxruntime" — CocoaPods adds it automatically because the
onnxruntime-c pod declares onnxruntime.framework as a vendored_framework,
and Xcode resolves -framework "onnxruntime" to the same binary that
-force_load also pulls in. Two pointers to the same static archive →
linker sees every symbol twice.

Drop -framework "onnxruntime" too. -force_load is sufficient: it both
resolves the framework AND retains every C symbol for dart:ffi lookup
at runtime, which is the whole point of this patch.

Verified locally:
- `pod install` produces clean OTHER_LDFLAGS with neither -framework
  "onnxruntime" nor -l "onnxruntime-objc", just -force_load on the
  sdk-conditional line.
- `flutter run -d <iPhone>` builds + deploys + launches successfully
  (Xcode build done. 15.1s).
- App boots on device, [JobStatusWatcher] FAILED record dump fires
  on startup as expected.

SAM warmup itself not yet verified — that path only fires when the
user enters capture and lockOrigin completes. To be tested next
capture session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er_cpp

DA3-LARGE-1.1 tile-based depth (W1 D3) + EdgeTAM subject mask (W2 D1)
algorithmic core moves from ios/Runner/*.swift into aether_cpp/{include,src}/
pipeline/ and is exposed via extern "C" aether_depth_tile_c.h. Swift now
delegates to C++ through the existing aether3d_ffi pod; CoreML/CGImage/
vImage stays Swift (Apple-only). Same pattern Android/Web/鸿蒙 will pick up
when their model-inference shims land (~150 LoC/platform vs duplicating
~1000 LoC of Swift math).

aether_cpp math layer (pure C++, no platform deps):
- include/aether/pipeline/tile_layout.h + .cpp
  make_tile_layout (4x3=12 tiles of 518 for 1920x1080, 32-px overlap;
  last-tile-pinned-to-edge no-underhang), tile_edge_weight + conf_weight
- include/aether/pipeline/tile_blend.h + .cpp
  blend_tiles with Method A 0.05 floor + Method B sin² trapezoid (the
  W1 D3 D4 fix that lifted coverage 99.71% → 100%)
- include/aether/pipeline/mask_post.h + .cpp
  numerically-stable sigmoid_inplace, pick_best_mask_hypothesis,
  extract_mask_plane, bilinear_resize (half-pixel-center, matches
  PIL/OpenCV INTER_LINEAR), edgetam_post_process composite

C ABI surface (include/aether_depth_tile_c.h):
- aether_compute_tile_layout / aether_blend_tiles
- aether_sigmoid_inplace / aether_pick_best_iou / aether_bilinear_resize
- aether_edgetam_post_process
All buffers caller-allocated; no malloc/free across the FFI boundary.

iOS Swift wrappers (delegate to C ABI):
- Tile2KWrapper.swift: makeLayout + blendTiles delegate to aether_*.
  CoreML Session/inferTile/fp16-via-vImage stays Swift (Apple-only).
- EdgeTAMWrapper.swift: IoU pick + sigmoid delegate to aether_edgetam_*.
  3-stage CoreML + CVPixelBuffer prep stays Swift.
- AetherDepthBench.swift: bench harness (W1 D2 + Tile2K E2E + EdgeTAM E2E)
- Runner-Bridging-Header.h: #import <aether3d_ffi/aether_depth_tile_c.h>

Build wiring:
- aether_cpp/CMakeLists.txt: 4 new .cpp in AETHER_FFI_SOURCES
- aether_cpp/aether3d_ffi.podspec: aether_depth_tile_c.h published
- xcframework rebuilt via scripts/build_ios_xcframework.sh (no regressions
  on existing symbols)

iPhone 14 Pro validation (iOS 26.3.1):
- W1 parity bit-equal: max |Δdepth|=1.19e-7, |Δweight|=3.58e-7 (fp32 noise)
- W2 D1 parity bit-equal: max |Δmask|=0.0 (perfect; sigmoid is exp-precise)
- E2E post-pivot output identical to pre-pivot:
    coverage 100%, depth range [0.740, 2.004], mask fg 69.9%, IoU 0.695

Known follow-up (deferred to end of W2):
- C++ blend_tiles currently takes std::vector<TileInference> which forces
  the C ABI to memcpy ~6.4M floats per call → blend went 18ms → 193ms.
  Fix: blend_tiles accepts non-owning const float* views; Swift passes
  pointers directly. ~1-2 hr.

mlpackage assets (Models/{DA3-LARGE-1.1,EdgeTAM}-CoreML/) intentionally
NOT committed (~1.5GB). Re-export via scripts/da3_export/* + EdgeTAM
official conversion before re-running the bench on a clean clone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anchor fit

Per-frame scale + translation alignment between DA3 monocular depth
(scale-invariant) and ARKit sparse anchors (metric meters). Closed-form
LSQ when outliers are absent; RANSAC robust fit when they aren't.

Why RANSAC and not iterative K-sigma rejection: the first attempt used
"after initial fit, drop |r| > K·rmse, refit" — bench-verified it FAILS
when ≥15% of anchors are outliers, because the initial bad fit inflates
rmse so much that the K·rmse band still includes the outliers (a real
LSQ pitfall). The new path runs 50 RANSAC iterations with 2-point
minimal fits + absolute inlier band (meters), then final LSQ on the
best inlier set. Handles >50% outlier fraction reliably.

aether_cpp/src/pipeline/scale_align.cpp:
- fit_st()         closed-form line fit, double-precision accumulation
- compute_rmse()   residual stats
- fit_st_2pt()     RANSAC 2-point minimal sample
- xorshift32       deterministic PRNG (reproducible bench)
- scale_align_lsq(...inlier_dist_m):
    inlier_dist_m == 0 → plain LSQ
    inlier_dist_m  > 0 → 50-iter RANSAC + final LSQ refit on inliers

C ABI (include/aether_depth_tile_c.h):
- aether_scale_align_result_t {scale, translation, rmse, n_used, n_input, ok}
- aether_scale_align_lsq(...)

iOS Swift bench (AetherDepthBench.runScaleAlignSyntheticTest):
- 30 synthetic anchors, true (s, t) = (0.85, 0.45), σ=2cm Gaussian noise
- Plain LSQ test: deterministic recovery within 1cm of truth
- RANSAC outlier test: inject 5/30 bad anchors @ +50cm offset,
  inlier_dist=5cm. Expect exactly 5 rejected.

iPhone 14 Pro validation:
- Plain LSQ: s=0.8519 (Δ=0.0019), t=0.4467 (Δ=0.0033), rmse=0.0124 ✓
- RANSAC: s=0.8427 (Δ=0.0073), t=0.4611 (Δ=0.0111), rmse=0.0119,
  n_used=25/30 (5 outliers correctly rejected) ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The W1+W2 D1 cross-platform pivot introduced a marshaling regression:
blend_tiles took std::vector<TileInference> (owning) so the C ABI
aether_blend_tiles had to copy each tile's depth+conf into per-tile
std::vector<float>. For 12 tiles × 2 × 268k floats that's ~25MB memcpy
per blend call. Plus the Swift wrapper packed all tile arrays into
packedDepth/packedConf (another 25MB Swift loop). Net: 18ms (Swift inline)
→ 193ms (Swift→C++ via FFI).

Hot-path API (aether_cpp/src/pipeline/tile_blend.cpp):
- New `TileView` struct: rect + non-owning const float* depth/conf
- New `BlendStats` struct: pure stats, no full-image vectors
- New `blend_tiles_view(views[], n, layout, …, out_depth, out_weight, stats)`:
  takes views, writes into caller-allocated full-image buffers, no allocations.
- Original `blend_tiles(vector<TileInference>, layout)` becomes a thin
  convenience wrapper that builds TileView from the owning vectors.

C ABI (aether_cpp/src/pipeline/aether_depth_tile_c.cpp):
- aether_blend_tiles now builds a std::vector<TileView> with caller's
  const float* pointers (no per-tile float copy) and calls blend_tiles_view
  directly into caller's out_depth/out_weight buffers.

Swift (Tile2KWrapper.swift):
- Removed packedDepth / packedConf packing step.
- New recursive `withTilePointers(tiles, index, accumulator, action)`:
  nests withUnsafeBufferPointer N levels deep (one per tile), capturing
  each tile's Swift [Float] baseAddress into an aether_tile_inference_t.
  At the deepest level, all N tile pointers are live and the C ABI is
  invoked. No Swift-side copy.

iPhone 14 Pro validation (iOS 26.3.1, AETHER_BENCH=1, no parity flag):
- Blend time: 193 ms → 41 ms (-79%)
- Output bit-identical to prior C++ path:
    coverage 100%, depth [0.740, 2.004], mean 1.080
- EdgeTAM mask, ScaleAlign W2 D2: unchanged, all green

The remaining 41ms vs 18ms Swift-inline gap is dominated by per-tile
TileView struct construction (Swift recursion + C++ std::vector). A
future optimization (stack-allocated TileView[N_MAX]) would close it,
but this is well within Plan G production envelope (~2.5s for 60 frames).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Off-device validation harness. Runs the same CoreML mlpackages bundled into
Runner.app on Mac via coremltools, producing visualization grids comparable
to on-device bench output. Used to spot-check mask / depth quality on real
PocketWorld dome captures without paying the iPhone install/launch round-
trip every time.

scripts/d5_quality_check/:
- pull_capture_frames.sh        — pull .mov from iPhone via devicectl
- extract_curated_frames.py     — pick 6 frames from curated.json timestamps
- da3_quality_check.py          — Tile2K DA3 inference + blend (W1 D3 D5)
- edgetam_quality_check.py      — EdgeTAM 3-stage mask inference (W2 D1 D5)

edgetam_quality_check.py mirrors EdgeTAMWrapper.swift bit-for-bit:
- 1024×1024 image_encoder input
- center-of-image prompt point (1 fg + 3 ignored, matching dome convention)
- 4 MB image_pe.float32.bin loaded offline
- sparse_embeddings sliced from (1,5,256) → (1,1,256) before mask_decoder
- numerically-stable sigmoid post-process
- multimask_output=1.0 → 3 hypotheses, pick best by IoU

Sample run on 6 globe-capture frames (Mac M-series, cpuOnly):
- avg picked IoU: 0.527  (Plan G expected range 0.5-0.8)
- avg foreground %: 5.6   (vs kitchen-sink fixture's 69.9% — confirms the
                           mask is tight on discrete subject, not whole scene)
- avg enc/dec time: 51 / 48 ms (Mac CPU; iPhone 14 Pro is ~5× slower)

Observation for W6 production: image-center prompt is suboptimal — Plan G
should feed the prompt point from PocketWorld's curated bbox center
(`_target_zone_metrics`), not naively the image center.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guous dome subjects

Plan G W2 D1 Mac quality check on real dome captures (globe on blue stool,
6 frames from a curated 118-pt orbit) exposed that the default
"image-center point prompt" lands on background (floor / wall / shelf) in
most orbit angles — subject is not always centered for handheld captures.
Result: avg fg 5.6%, IoU 0.527; masks hit empty space, not subject.

Initially attributed the failure to argmax(iou_pred) picking sub-part
hypothesis 2. Cross-checked SAM 2 docs (facebookresearch/sam2
sam2_image_predictor.py) — the 3-hypothesis ordering has NO guaranteed
whole/part/subpart semantic; argmax(iou) IS the official picker. The real
root cause is just the prompt being wrong.

Fix: extend EdgeTAMWrapper to accept `promptBox: CGRect?` alongside
`promptPoint: CGPoint?`. SAM 2 box prompt is non-ambiguous per Meta docs:
> "For non-ambiguous prompts ... multimask_output=False can give better
>  results"
For our case, multimask_output stays True (mlpackage is fixed-shape), but
box prompt makes the 3 hypotheses converge so argmax picks the right mask.

Swift (EdgeTAMWrapper.swift):
- predictMask(image:promptPoint:promptBox:): when promptBox is given, map
  to 1024-space and overwrite shared `emptyBox` MLMultiArray; reset to
  zeros when nil (single-threaded API contract).
- If only promptBox is provided, the foreground point defaults to box
  center (combining point + box is most informative per SAM 2 docs).

Mac script (scripts/d5_quality_check/edgetam_quality_check.py):
- New CLI: --prompt-point X,Y / --prompt-box X1,Y1,X2,Y2 (original-image
  pixel coords; script handles 1024-space scaling).
- Default still image-center (legacy parity).

iPhone bench (AetherDepthBench.runEdgeTAME2EIfBundled):
- Runs center-prompt path (legacy default) AND box-prompt path on the
  same fixture, saves edgetam_mask_test.png + edgetam_mask_box_test.png.
- Extracted writeMaskPNG helper to dedupe the CGImage construction.

iPhone 14 Pro validation:
- Box-prompt path compiles + runs (152 ms mask predict, mem same envelope)
- Output: independent PNG saved, IoUs valid (0.695, 0.302, 0.038)

Mac validation on real dome captures (frame_02_idx046, 2160×3840 portrait):
- Default center prompt: IoU=0.886, fg=0.6% (mask on empty floor — wrong)
- Tight box on globe (760,380,1360,1420): IoU=0.898, fg=5.0% (mask
  precisely on globe — correct)
- Tight box on chair (380,1400,1840,3200): IoU=0.507, fg=1.9% (mask on
  chair seat — correct)

W6 capture pipeline TODO (separate task): wire promptBox from PocketWorld
curated frame `_target_zone_metrics` per-frame, instead of caller picking
a hardcoded box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant