fix(replay): seed step_counters on fork so replay steps continue after inherited prefix#164
Conversation
…r inherited prefix Bug A — engine.fork() created the new timeline row but didn't seed its step_counters entry. The runner's first replayed step landed at step_number=1, colliding with steps inherited from the parent. Combined with the owned-over-inherited dedup added in PR #162, this shadowed the original turn-1..N steps with the agent's NEW turn-(N+1) work and put the inherited turn-N user message at the END of the timeline — sorting the user's edited question AFTER the agent's response. Live repro on dev1 session ray-agent-a18ac577 (2026-05-03): clicking Run replay on edited-fork at step 6 produced owned steps numbered 1..5 + an inherited step 6, displayed in that order. Operators saw "nothing after the LLM step" because the agent's response was already above the question they'd edited. Fix: seed step_counters[fork_id] = fork_at_step right after create_timeline. Next runner-recorded step gets fork_at_step+1, chronological order is preserved. Verified end-to-end through the dashboard: - Pre-fix: replay-64e61b6c shape was [1,2,3,4,5,6_inherited] - Post-fix: replay-d4c0d5b9 / replay-f0c54d86 shape is [6_inherited, 7,8,9,10,11] — matches the agent's actual conversation flow. Existing broken forks repaired via scripts/repair-fork-step- numbering.py (21 forks, 50 steps renumbered on dev1). Script is idempotent and only touches `replay-*` forks (auto-generated runner replay forks); user-created forks like edited-fork or fork-at-N are left alone since their owned steps at step_number ≤ fork_at_step are intentional promote-and-mutate edits. Tests: - fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix - fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2 - fork_does_not_disturb_existing_step_counters_on_other_timelines Version bump (Track 1 + Track 2 per CLAUDE.md): - Cargo workspace: 0.14.8 -> 0.14.9 - python-mcp/pyproject: 0.13.7 -> 0.13.8 - CLI_VERSION (rewind_cli, rewind_mcp_cli): 0.14.8 -> 0.14.9 - python/rewind-agent SDK: 0.15.2 -> 0.15.3 Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
risjai
left a comment
There was a problem hiding this comment.
Code Review — PR #164
Overall: The core Rust fix is correct and well-tested. The one-liner sync_step_counter call is the right approach and the three regression tests give excellent coverage (main path, edge-case fork@1, isolation from parent counter). Version bumps follow CLAUDE.md rules. PR body and test plan are thorough.
Findings (by severity)
P1 — Repair script creates step-number collisions when runner recorded more steps than fork_at_step
The script only renumbers owned steps with step_number <= fork_at_step, then shifts them by +fork_at_step. But in the broken scenario, the runner's counter started at 1 and kept incrementing — so a fork at step 5 where the runner recorded 8 steps has owned steps [1, 2, 3, 4, 5, 6, 7, 8]. The script renumbers 1→6, 2→7, 3→8, 4→9, 5→10 — but owned steps 6, 7, 8 already exist and don't get moved. Since steps has no UNIQUE constraint on (timeline_id, step_number), the UPDATEs succeed silently, creating duplicate rows at step_numbers 6, 7, 8. get_full_timeline_steps uses HashMap::insert so one of each pair is arbitrarily dropped.
Fix: shift ALL owned steps (not just those ≤ fork_at_step) by fork_at_step, and process in descending step_number order to avoid transient collisions if a UNIQUE index is ever added. See inline comment.
P2 — Non-atomic create_timeline + sync_step_counter in fork()
If sync_step_counter fails after create_timeline succeeds, the fork exists without a seeded counter — exactly the broken state this PR fixes. Low practical risk (same SQLite conn, same disk), but worth wrapping in a store-level transaction for correctness.
Nit — The 18-line doc comment on fork() duplicates the commit message and test comments. Consider trimming to 3-4 lines ("Seeds step_counters so the first runner-recorded step continues at at_step + 1. See regression test for rationale.").
| cur.execute(""" | ||
| SELECT id, step_number FROM steps | ||
| WHERE timeline_id = ? AND step_number <= ? | ||
| ORDER BY step_number |
There was a problem hiding this comment.
P1 Bug — step-number collision when runner recorded more than fork_at_step steps.
This query only selects owned steps with step_number <= fork_at_step, then shifts them by +fork_at_step. But the broken counter started at 1, so a runner that recorded M steps produces owned steps [1..M]. When M > fork_at_step, the renumbered early steps (e.g. 1 → fork_at+1) collide with existing un-shifted later steps (e.g. step at fork_at+1).
Since steps has no UNIQUE constraint on (timeline_id, step_number), the UPDATEs succeed silently, creating duplicate step_numbers. get_full_timeline_steps's HashMap::insert then drops one arbitrarily.
Suggested fix — shift ALL owned steps, not just those ≤ fork_at:
cur.execute("""
SELECT id, step_number FROM steps
WHERE timeline_id = ?
ORDER BY step_number DESC -- descending avoids transient collisions
""", (fork["id"],))Then new_num = s["step_number"] + fork_at for every owned step. Processing in DESC order ensures the highest step_number is moved first, so no in-flight collision can occur if a UNIQUE index is ever added.
| /// its parent. Without seeding, the first new step a runner | ||
| /// records on the fork would land at step_number=1, shadowing | ||
| /// the inherited prefix in `get_full_timeline_steps` (which | ||
| /// dedups owned-over-inherited at the same number) and sorting |
There was a problem hiding this comment.
P2 — Non-atomic seeding. If sync_step_counter fails after create_timeline succeeds (e.g. disk-full between the two writes), the fork exists without a seeded counter — the same broken state this PR fixes.
Low practical risk on a single SQLite connection, but consider wrapping both calls in a store-level transaction (or adding a create_timeline_and_seed_counter helper) so they're atomic.
P1 (repair script): the SQL renumber was selecting only owned steps with step_number <= fork_at_step, then shifting each by +fork_at_step. That was wrong when the runner overran the fork point — a fork@5 with M=8 recorded iterations has owned steps [1..8]; shifting only [1..5] leaves [6..8] in place, and the renumbered early steps land on top of them ([1->6, 2->7, 3->8] all collide). Since `steps` has no UNIQUE on (timeline_id, step_number), these UPDATEs would silently create duplicates and HashMap::insert in get_full_timeline_steps would arbitrarily drop one of each pair. Fix: shift EVERY owned step on a flagged fork (not just ≤ fork_at_step) and process in DESCENDING step_number order. Each UPDATE then targets a step_number that doesn't yet exist among un-shifted rows, even if a UNIQUE constraint is added later. Idempotent: a second run finds nothing to fix because all owned steps now sit at step_number > fork_at_step. P2 (atomicity): the previous fork() called create_timeline followed by sync_step_counter as two separate writes on the same connection. If the second one failed (e.g. disk-full between the two calls), the fork existed without a seeded counter — the exact broken state this PR fixes. Wrap both into a single transaction via a new `Store::create_timeline_with_seeded_counter` helper. Nit: trimmed the 18-line doc comment on `fork()` to 3 lines and moved the rationale into the regression test (which is where future maintainers will read it anyway). CI fix: `test_patch_promote_main_protection_follows_target` was implicitly relying on `.last()` returning an inherited step (the buggy fork numbering put inherited steps after owned ones). After this PR's fix the fork's `.last()` is the fork-OWNED step at step_number=fork_at_step+1, and that step isn't visible on main, so the promote-and-mutate visibility check 400s. Pin the test to explicitly select step_number=2 (the inherited one) so it tests main-protection regardless of the step-counter behavior. Also added a `main_edits_env_lock()` mutex shared by the two tests that mutate REWIND_ALLOW_MAIN_EDITS — previously they raced under parallel `cargo test`, surfaced once both tests do real PATCHes against main. 296 tests across rewind-web pass, 16 in rewind-replay pass, full workspace green. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Pushed P1 (repair script collision) — P2 (atomic fork seed) — added Nit — trimmed CI fix (separate from the review) — Full workspace green: 296 tests across |
risjai
left a comment
There was a problem hiding this comment.
Re-review: all findings addressed ✓
P1 (collision) → Repair script now shifts ALL owned steps (no step_number <= fork_at filter), DESC order avoids transient collisions. Docstring updated with rationale.
P2 (non-atomic) → New create_timeline_with_seeded_counter wraps both writes in a single unchecked_transaction. fork() calls the one atomic method. Clean separation — the store owns the transaction boundary.
Nit (doc comment) → Trimmed to 4 lines, points to regression test.
Bonus: Good catch on test_patch_promote_main_protection_follows_target — the .last() lookup was implicitly relying on broken counter behavior (returning the inherited step at #2 by accident). Switching to explicit step_number == 2 lookup makes the test correct under both old and new counter semantics. The OnceLock<Mutex<()>> env-var serialization is the right pattern for the parallel test race.
No new issues found. LGTM — ready to merge.
Summary
engine.fork()created the timeline row but didn't seedstep_counters, so a runner's first replayed step landed atstep_number=1. With PR fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps #162's owned-over-inherited dedup that shadowed the original turn-1..N steps and pushed the inherited turn-N user message to the END of the timeline. Operators saw "nothing after the edited question" because the agent's reply was above the question chronologically.step_counters[fork_id] = fork_at_stepright aftercreate_timeline. First runner-recorded step on the fork now getsfork_at_step + 1. Chronological order preserved.dev1sessionray-agent-a18ac577: clicking Run replay onedited-forkat step 6 used to produce owned steps[1, 2, 3, 4, 5]+ inherited6. Now produces inherited6+ owned[7, 8, 9, 10, 11].replay-*forks repaired in place viascripts/repair-fork-step-numbering.py(idempotent; only touches replay-prefixed labels — leavesedited-fork/fork-at-Nalone since their owned step atstep_number == fork_at_step + 1is intentional promote-and-mutate semantics).Test plan
cargo test -p rewind-replay --lib— 16 tests pass including 3 new regression tests:fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefixfork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2fork_does_not_disturb_existing_step_counters_on_other_timelinescargo test -p rewind-store -p rewind-web --lib— 108 tests pass, no regressions.vitest run StepDetailPanel.test— 15 UI dispatch tests pass.v0.14.9binary on the PVC:window.fetchand clicked Run replay → captured POST body showssource_timeline_id=<edited-fork-uuid>andat_step=6(not main, no UI bug).replay-f0c54d86) renders steps6 → 7, 8, 9, 10, 11in order; final llm response is the agent's answer to the user's edited turn-2 question.replay-64e61b6c(the user's previously-broken fork) now displays in correct order.Version bumps (per CLAUDE.md)
Track 1 (Rust binary, since
crates/rewind-replaychanged and v0.14.8 has been released):Cargo.toml: 0.14.8 → 0.14.9python/rewind_cli.pyCLI_VERSION: 0.14.8 → 0.14.9python-mcp/pyproject.toml: 0.13.7 → 0.13.8python-mcp/rewind_mcp_cli.pyCLI_VERSION: 0.14.8 → 0.14.9Track 2 (Python SDK, since
python/rewind_cli.pychanged and 0.15.2 is on PyPI):python/pyproject.toml: 0.15.2 → 0.15.3python/rewind_agent/__init__.py__version__: 0.15.2 → 0.15.3Post-merge checklist
v0.14.9on GitHub Release (triggers binary build)../scripts/publish-pypi.shfrompython/../scripts/publish-mcp-pypi.shfrompython-mcp/.Made with Cursor