Unify provider derive/project through ConversationView; drop Turn.extra#86
Merged
Conversation
…file_mutations Additive data-model changes on `ConversationView` and `ToolInvocation` to prepare for provider unification — providers will populate these in their `to_view` so `derive_path` can be the single canonical `ConversationView -> Path` projection. New on `ConversationView`: - `base: Option<SessionBase>` — path-level base (working_dir, vcs_revision, vcs_branch, vcs_remote). Will project to `Path.base`. - `extra: HashMap<String, Value>` — path-level provider-namespaced extras. Will project to `Path.meta.extra`. New on `ToolInvocation`: - `file_mutations: Vec<FileMutation>` — resolved file mutations (path, operation, raw_diff, before, after) computed by the provider's `to_view`. Will project to sibling `file.write` change entries. `ConversationView` and `ToolInvocation` now derive `Default` so the new fields can be omitted from struct literals with `..Default::default()`. Existing call sites updated. No behavior change yet — `derive_path` / `extract_conversation` don't project the new fields; providers don't populate them. Wires up next.
…_path `derive_path` now projects the new `ConversationView` fields: - Step ids use `turn.id` / `event.id` when non-empty; otherwise synthesize `step-NNNN` / `event-NNNN` as before. Lets claude's native UUIDs round-trip cleanly through `extract_conversation`. - `view.base` (`SessionBase`) projects onto `Path.base` (`uri` from `working_dir`, `ref` from `vcs_revision`, `branch` from `vcs_branch`). `vcs_remote` (no slot on `Path.base`) lands in `meta.extra["vcs_remote"]`. `config.base_uri` still wins as a CLI override. - `view.extra` merges into `path.meta.extra`. - Each `ToolInvocation::file_mutations` entry emits a sibling `file.write` change keyed by `path` with `raw = raw_diff` and `tool` / `tool_id` / `operation` / `before` / `after` in `structural.extra`. Falls back to the existing synth-from-tool-input behavior when `file_mutations` is empty so un-migrated providers keep working. `extract_conversation` is the inverse: extracts `path.base` / `meta.extra` / sibling `file.write` entries back onto `view.base` / `view.extra` / `tool_uses[].file_mutations` (matched by `tool_id`). Four derive tests that pinned to synthesized `step-NNNN` ids updated to the native turn ids.
Pi's `derive_path` is already a one-line wrapper around
`toolpath_convo::derive_path`, so the only migration work is enriching
`session_to_view` with the new view fields where applicable.
- `view.base` now carries `working_dir` from `session.header.cwd`.
- `view.extra` left empty (no provider-namespaced path-level extras today).
- `tool.file_mutations` left empty (Pi sessions don't carry pre-resolved
diffs; the synth-from-tool-input fallback in `derive_path` handles file
writes as before).
End-to-end test updated: head now equals the last turn's native id ("m4")
instead of the synthesized "step-0004", since `derive_path` now preserves
turn ids when set.
…ame_to
Migrates `toolpath-codex/src/derive.rs` from its bespoke
`derive_path_from_view` to a one-line wrapper around
`toolpath_convo::derive_path`. All codex-specific data
(cwd, git, file diffs from `patch_apply_end`) is captured
during `to_view`; nothing provider-specific lives in
`derive.rs` anymore (767 lines → ~170).
Behavior changes for codex-derived paths:
- File-mutation `structural.type` unifies on `"file.write"`
(was `codex.add` / `update` / `delete` / `unknown`).
`operation` lives in `structural.extra`; rename target
lives in the new `FileMutation.rename_to`.
- `tool_calls` summary array on `conversation.append`
dropped — it was a pre-formatted display alias for
`tool_uses` (same info).
- `phase` field on `conversation.append` dropped — still
accessible under `turn.extra["codex"]["phase"]`.
- Actor `Identity` records simplify to the shared shape.
- Empty-carrier turn filter moved into `to_view`.
- Turns now carry `parent_id` linked sequentially.
Synthetic ids (`codex-turn-NNNN`) assigned to turns
whose source message had no native id. Event ids
disambiguated by suffixing collisions.
IR additions (the only typed slots that earned their keep):
- `ProducerInfo { name, version }` and
`ConversationView.producer` — codex's `originator` +
`cli_version` have no existing slot.
- `FileMutation.rename_to: Option<String>` — first-class
rename concept; replaces codex's `move_path`.
Things explicitly kept off the IR (no smuggling through
provider-namespaced extras):
- `model_provider` ("openai", "anthropic", …) already
lives on `ActorDefinition.provider` per assistant turn.
- Codex's `source` ("cli") and `forked_from_id` have no
cross-harness analog; the codex projector hard-codes
defaults on the return path, so they drop here.
- `byte_count` / `line_count` are derivable from `after`.
Also: shared `derive_path` now chains event steps off the
previous step when their `event.parent_id` isn't set, so
codex's `session_meta` / `turn_context` / `patch_apply_end`
event steps land on the head ancestry instead of orphaning.
… to Turn
Migrates `toolpath-opencode/src/derive.rs` from its bespoke
`derive_path_from_view` to a one-line wrapper around
`toolpath_convo::derive_path`. The snapshot git2 tree↔tree diff work
moves into `provider::to_view_with_resolver`, which now produces a
fully enriched `ConversationView` (snapshot-derived mutations on each
turn, tool-input fallback for gitignored paths, `view.base` from
`session.directory` + `session.project_id`, `view.producer` for the
opencode version). `derive.rs` shrinks from ~820 lines to ~200.
IR refinement included in this commit: `file_mutations` moves from
`ToolInvocation` to `Turn`, with an optional `tool_id` field on each
`FileMutation` linking back to a specific tool call when the provider
can attribute it. This is the honest model — codex's `patch_apply_end`
DOES attribute per-tool (via `call_id`), but opencode's snapshots are
per-turn (no per-tool linkage available), and claude/gemini will
populate `tool_id` from tool-input synthesis. The shared `derive_path`
projects each `Turn.file_mutations` entry to a sibling `file.write`
change, putting `tool` / `tool_id` in `structural.extra` when present.
Behavior changes for opencode-derived paths:
- File-mutation `structural.type` unifies on `"file.write"`
(was `opencode.add` / `update` / `delete` / `touch`).
- `tool_calls` summary array on `conversation.append` dropped — it was
a pre-formatted display alias for `tool_uses`.
- `meta.extra.opencode` aggregate dropped — `session_id` was
`view.id`, `project_id` is `view.base.vcs_revision`, `slug` /
`version` / `title` flow through other slots, `total_tokens` is
`view.total_usage`, `files_changed` is `view.files_changed`. The
opencode projector hard-codes safe defaults on the return path.
- Actor `Identity` records simplify to the shared shape.
- Empty-carrier turn filtering moved into `to_view`.
`no_snapshot_diffs` kept on opencode's `DeriveConfig` for tests/CLI
intent ("don't open git2"); skipping just routes through `to_view`
instead of `to_view_with_resolver`.
Claude was the most invasive migration — `derive.rs` shrinks from 2759
lines to ~165. All Claude-specific work moves into `provider.rs`:
- `conversation_to_view` now pulls `view.base` (working_dir, vcs_branch
from the first entry that has them) and `view.producer` (name:
"claude-code", version from the first entry's `version`).
- New `compute_file_mutations` helper populates `Turn.file_mutations`
for every `FileWrite`-category tool invocation. Calls `git show
HEAD:<path>` via the moved-in `git_head_content` to resolve `Write`'s
pre-image, then `toolpath_convo::file_write_diff` to synthesize the
unified diff. `tool_id` on each mutation links back to the
invocation. Mutation `operation` ("add" for `Write`, "update" for
`Edit` / `MultiEdit` / `NotebookEdit`) and `after` (Write content)
populate where they're available.
Behavior changes for claude-derived paths:
- Tool executions no longer become their own `tool.invoke` steps
(actor `agent:claude-code/tool:<Name>`). File mutations attach to
the parent assistant turn step as sibling `file.write` change
entries, like every other provider.
- Conversation-artifact key unifies on `claude-code://<session-id>`
(was `agent://claude/<session-id>`).
- No more `conversation.init` step. The cwd / vcs_branch / version it
carried lived only at the path level; they now ride `view.base` and
`meta.extra.producer`, which both round-trip through extract.
- Step ids are claude entry UUIDs (already what claude emitted).
`derive.rs` reduced to: wrap with title override + `view = to_view()`
+ delegate to `toolpath_convo::derive_path`. The 100+ tests it carried
collapse to four smoke checks (basic shape, producer present, actors
populated, single-path ancestry). The covered behaviors live elsewhere
now: shared derive_path tests in `toolpath-convo`, view-shape tests in
`toolpath-claude/src/provider.rs` (38 tests, unchanged), and
fidelity-roundtrip tests in `toolpath-claude/tests/` (18 tests,
unchanged).
Cross-harness `matrix_translation` test passes for all 25 cells.
…complete Gemini's `derive.rs` shrinks from 912 lines to ~135. All gemini-specific work moves into `provider.rs`: - `conversation_to_view` sets `view.base` from `project_path` or `main.directories()` and `view.producer` (name: "gemini-cli", no version recorded in the source format). - New `compute_file_mutations(msg.tool_calls())` helper populates `Turn.file_mutations` for every `FileWrite`-category call. Diff preference: gemini's own `resultDisplay.fileDiff` when present (the harness already computed it), otherwise a hand-rolled fallback from args (old/new pair for `replace`, content for `write_file`). `tool_id` links to the source `ToolCall::id`. - `fallback_raw_diff` moved verbatim from `derive.rs`. - Turns are linked sequentially via `parent_id` since Gemini's wire format doesn't carry one on messages. (Sub-agent linearization via `Turn.delegations` was already working — unchanged.) Behavior changes for gemini-derived paths: - File-mutation `structural.type` unifies on `"file.write"` (was `gemini.write_file` / `gemini.replace` / `gemini.edit`). - Tool-call summary array on `conversation.append` dropped (same redundancy as we addressed for codex/opencode/claude). - Conversation-artifact key shifts to `gemini-cli://<session-uuid>` (matches the `<provider_id>://<id>` convention every other provider now uses). `derive.rs` reduced to: wrap with title override + `view = to_view()` + delegate to `toolpath_convo::derive_path`. Old test suite (26 tests) replaced with four smoke checks. This is the last provider migration. All five conversation provider crates (`pi`, `codex`, `opencode`, `claude`, `gemini`) now share the canonical `ConversationView → Path` mapping. Each provider's `derive.rs` is a one-line wrapper; the IO and provider-specific fidelity work lives in `to_view`. Cross-harness `matrix_translation` passes for all 25 cells.
…a Turn.extra Refactors opencode's `Builder` so the snapshot git2 tree↔tree diff IO happens during `handle_assistant_message` (where the source `Part`s are in scope) rather than as a post-pass that reads back the snapshot SHAs from `Turn.extra["opencode"]["snapshots"]`. Concretely: - `Builder` gains `snapshot_repo: Option<git2::Repository>` and `prev_snapshot_after: Option<String>` fields. - New `compute_turn_mutations(&snapshots, &tool_uses)` method does the diff against the previous turn's `after` snapshot and threads the tool-input fallback for paths the diff missed (gitignored / no-repo). - `build_with_resolver` no longer calls `attach_snapshot_diffs` / `attach_tool_input_fallbacks` — those free functions are gone. `Turn.extra["opencode"]["snapshots"]` is still written for the opencode projector's round-trip path (it consumes those SHAs to re-emit `step-start` / `step-finish` parts). That comes off in the next commit when `Turn.extra` itself is removed. Prep for dropping `Turn.extra` as a pipeline mechanism for IR data; provider-namespaced extras are about to go away entirely.
Turn.extra was a per-provider stash (Turn.extra["claude"], Turn.extra["pi"],
etc.) for fields that didn't fit the typed ConversationView. Each provider's
projector then read its own namespace back out to rebuild the wire format
with high fidelity.
The IR's job is cross-harness translation, not source-format preservation.
Carrying provider-specific blobs end-to-end conflated those two goals and
encouraged providers to lean on extras instead of mapping cleanly onto the
shared types. With the field removed, every provider now derives onto and
projects from the typed IR only.
Two ripple effects worth noting:
- Pi's tool-result entries no longer emit a standalone Role::Other("tool")
turn. The result content folds into the matching assistant turn's
tool_uses[i].result via the same pass-2 mechanism the other harnesses
already use. Aligns Pi with claude/gemini/codex/opencode and keeps
Pi → Pi idempotent without smuggling tool_call_id.
- Round-trip fidelity is now best-effort on fields that lived only in
extras: codex encrypted reasoning ciphertext, gemini per-thought
timestamps, gemini token breakdown (thoughts/tool/total), opencode
snapshot/patch metadata, pi compaction/bashExecution metadata. The
cross-harness matrix still passes — translation works; only wire-level
exact equality on the originating harness regresses.
Tests that asserted on the smuggled extra contents have been removed
(extras themselves are gone, so the assertions can't express anything).
The cross_harness_matrix no_foreign_extras invariant is gone for the
same reason.
…ge as snake_case Two UI-breaking bugs in the claude import/export round trip: **1. Broken turn chain.** When a tool-result-only user entry was absorbed into the preceding assistant's tool_uses[i].result, its UUID was dropped on the floor. The next assistant entry's wire parentUuid (which pointed at the absorbed entry) then had no matching Turn in the IR, so `derive_path` couldn't resolve it and emitted an empty parents list. In a real session that meant **985 of 2661 steps had no parents** after roundtrip. Claude's UI walks the chain from the latest entry backwards via parentUuid; the first None bounce stopped the walk at the tail, which is why opening the re-exported conversation showed only the very last response. Fix: when absorbing a tool-result-only entry, record a rewrite from `entry.uuid` → previous turn id. Apply the rewrite when computing the next turn's parent_id. Now every assistant's parent resolves and the walk from head reaches the root user. **2. "NaN tokens old" in the UI.** The `Usage` struct was tagged `rename_all = "camelCase"`, which emitted `inputTokens` / `outputTokens` / `cacheCreationInputTokens` / `cacheReadInputTokens` on serialize. But Claude's wire format passes `message.usage` straight from the Anthropic API, which is snake_case. The UI parses `input_tokens` etc., got undefined for every field, and computed NaN. Fix: switch to `rename_all = "snake_case"` on `Usage` and `CacheCreation`. The `#[serde(alias = ...)]` attrs were already snake_case, so deserialization was never broken — only serialization.
|
🔍 Preview deployed: https://e40f36c8.toolpath.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Route every
toolpath-<harness>provider throughConversationViewas the canonical IR:derive_pathandextract_conversation(intoolpath-convo) are the single forward/reverse pair. All five providers (claude/codex/gemini/opencode/pi) lost their bespoke derive logic — net −4800 lines across the provider crates.ConversationViewgained typed fields for everything that was load-bearing:SessionBase,ProducerInfo,FileMutationonToolInvocation,eventsfor non-turn entries.Turn.extra: HashMap<String, Value>removed entirely. It had become a smuggling channel for source-format-specific blobs and conflated cross-harness translation with wire-format preservation.Trade-offs
Round-trip fidelity is now best-effort on fields that only lived in
Turn.extra: codex encrypted reasoning ciphertext, gemini per-thought timestamps and the thoughts/tool/total token sub-counts, opencode snapshot/patch metadata, pi compaction/bashExecution metadata. The four canonical token fields (input/output/cache_read/cache_write) still survive across all harnesses. Thecross_harness_matrixtest exercises every (source, target) pair and still passes — translation is preserved; only wire-level exact equality on the originating harness regresses for these specific fields.One behavioral change: pi's
ToolResultentries no longer emit standaloneRole::Other(\"tool\")turns. The content folds onto the matching assistant'stool_uses[i].resultvia pass 2, aligning pi with claude/gemini/codex/opencode and keeping pi→pi idempotent without smugglingtool_call_idthrough extras.Bug fixes piggybacked on the last commit
While testing import/export against a real Claude session, two UI-visible bugs surfaced:
tool_uses[i].result, its UUID was dropped — the next assistant's wireparentUuidthen had no matching Turn, andderive_pathleftstep.parentsempty. In a real session, 985 of 2661 steps were parentless after roundtrip; Claude's UI walks the chain from the tip and stopped at the last response. Fix: record a rewrite from the absorbed entry's UUID to the previous turn's id and apply it when computing the next turn'sparent_id.Usagestruct was taggedrename_all = \"camelCase\", but Claude's wire format forwardsmessage.usagestraight from the Anthropic API (snake_case). The UI parsedinput_tokensetc. and got undefined. Fix:rename_all = \"snake_case\"onUsageandCacheCreation.Test plan
message.usagekeys serialize as snake_caseNeed help on this PR? Tag
@codesmithwith what you need.