feat: improve build progress output by Moskize91 · Pull Request #62 · oomol-lab/spinedigest

Moskize91 · 2026-06-17T07:38:50Z

No description provided.

coderabbitai · 2026-06-17T07:39:03Z

Summary by CodeRabbit

Documentation
- Clarified archive exploration workflow documentation across all help topics and guides, emphasizing the recommended command sequence for structured browsing.
Improvements
- Enhanced build progress reporting with live streaming updates and refined status visualization during archive processing.

Walkthrough

The PR makes two independent sets of changes. First, all documentation (README files, Jinja help templates, and Markdown docs in English and Chinese) is updated to insert chapter tree --json as the first step in structure-mode archive exploration, before list --type chapter and page chapter:<id>, and to clarify that chapter tree without apply is read-only.

Second, a streaming progress pipeline is added to the code: LLMStreamProgressCallback is defined and stored on LLM, which emits per-chunk character counts during streaming; SerialProgressTracker is replaced by a new SerialProgressSink interface; the chapter facade gains per-chapter word counts and emits "progress" events with graph/summary word-count snapshots; and the CLI progress writer is refactored from a heartbeat system to a throttled single-line status renderer fed by the new stream progress callbacks.

Possibly Related PRs

oomol-lab/spinedigest#54: Directly related — both PRs modify createStageLLM in src/cli/stage-runtime.ts; this PR adds the onStreamProgress option while the earlier PR introduced the function for sdpub stage workflows.
oomol-lab/spinedigest#56: Related — the main PR's advanceChapterStages progress event changes and streaming plumbing build on the sdpub stage advance/progress/heartbeat wiring established in that PR.
oomol-lab/spinedigest#57: Related — the main PR's src/cli/archive.ts streaming progress refactor extends the archive-first CLI build infrastructure introduced in that PR.

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No pull request description was provided, making it impossible to assess whether it relates to the changeset.	Add a description explaining the purpose of the build progress improvements, the problem being solved, and how the changes address it.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: improve build progress output' follows the required format and clearly describes the main change: enhancing build progress visualization.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/build-progress-line

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

README.md (1)
188-188: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update AI-agent mode guidance to match the new structure-first flow.

Line 188 still says to start structural understanding with list/page, which now conflicts with the updated structure-mode contract (chapter tree --json first). Please align this line to avoid mixed instructions in the same README.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 188, The exploration mode guidance on line 188 is outdated
and conflicts with the new structure-first flow requirement. Update the sentence
to indicate that `chapter tree --json` should be the first step for structural
understanding, then use `list/page` for further refinement, and keep the
guidance about `find/grep` for candidate discovery and `read` for continuous
prose. This will align the exploration mode instructions with the updated
structure-first contract.
README_zh-CN.md (1)
188-188: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

将 AI Agent 指引同步为“先 chapter tree --json”流程。

Line 188 仍写“结构理解先用 list/page”，与本次已更新的结构模式（先 chapter tree --json）不一致，建议统一口径避免歧义。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README_zh-CN.md` at line 188, The guidance text on line 188 in the
README_zh-CN.md file contains outdated information about the exploration mode
process. Specifically, the phrase "结构理解先用 `list/page`" contradicts the updated
process that now recommends starting with `chapter tree --json`. Update this
line to replace the reference to `list/page` with `chapter tree --json` to align
with the current structure understanding approach and maintain consistency
throughout the documentation.

🧹 Nitpick comments (3)

src/llm/client.ts (1)
375-385: ⚡ Quick win

Consider logging suppressed callback errors.

The empty catch block silently suppresses all errors thrown by the onStreamProgress callback. While this prevents callback failures from breaking the stream, it also hides programming errors in callback implementations, making debugging difficult.

Consider logging the error before returning:
   try {
     await this.#onStreamProgress({ outputCharacters });
   } catch {
-    return;
+    // Log error but don't propagate to avoid breaking the stream
+    getLogger({ component: "llm" }).warn("Stream progress callback failed", error);
+    return;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/client.ts` around lines 375 - 385, The `#emitStreamProgress` method has
an empty catch block that silently suppresses any errors thrown by the
`#onStreamProgress` callback, making it difficult to debug callback implementation
issues. Add error logging to the catch block to capture and log the error before
returning, ensuring that callback failures are visible for debugging while still
preventing the stream from being interrupted by callback errors.
src/facade/chapter.ts (1)
593-605: ⚖️ Poor tradeoff

Consider performance impact of synchronous word counting.

This code iterates all fragments and sentences synchronously to compute word counts for each chapter. For archives with many chapters and large fragment collections, this could introduce noticeable latency in listChapters().

Consider these options:

Cache word counts when fragments are created/modified (store alongside fragment records)

Parallelize the fragment iteration using Promise.all() if the order doesn't matter

Profile actual performance with representative data to confirm if optimization is needed

Since this follows the existing pattern in the file and words are only computed during advancement (not on every read), the current implementation may be acceptable. However, it's worth monitoring if users report slow chapter listings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/facade/chapter.ts` around lines 593 - 605, The word counting in the loop
over fragmentIds is performed sequentially, fetching and processing each
fragment one at a time, which can be slow for large collections. Replace the
synchronous for loop that calls serialFragments.getFragment(fragmentId) with a
Promise.all() approach that fetches all fragments in parallel, then accumulates
the word counts from all fetched fragments concurrently. This will maintain the
same functionality while improving performance for archives with many fragments.
src/cli/archive.ts (1)
368-380: 💤 Low value

Document the token estimation assumption.

The hardcoded outputCharactersPerToken = 4 (line 380) is a rough approximation that can vary significantly:

By model: GPT-4 vs Claude vs other models have different tokenizers

By language: Chinese/Japanese typically have fewer characters per token (~1-2), while English averages ~4

By content: Code, markdown, and plain text tokenize differently

While this is only used for display (not billing), the estimated token count might be confusing or misleading to users working with non-English text or specific models.

Consider:

Document the assumption in a comment explaining it's an English-language average

Make it configurable via CLI flag or config file if needed

Use model-specific defaults if the model provider/ID is known

For now, adding a comment would improve maintainability:
- const outputCharactersPerToken = 4;
+ // Rough estimate: ~4 chars/token for English text with common LLMs
+ // Actual ratio varies by model, language, and content type
+ const outputCharactersPerToken = 4;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli/archive.ts` around lines 368 - 380, In the
createStageAdvanceProgressWriter function, add a comment above the
outputCharactersPerToken variable assignment explaining that the hardcoded value
of 4 is a rough English-language average for character-to-token conversion, and
that this approximation can vary significantly depending on the model being used
(different tokenizers), the language of the content (Chinese/Japanese typically
have fewer characters per token), and the type of content being tokenized (code
vs markdown vs plain text). This documents the assumption for future maintainers
and acknowledges the limitation of this estimate.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@README_zh-CN.md`:
- Line 188: The guidance text on line 188 in the README_zh-CN.md file contains
outdated information about the exploration mode process. Specifically, the
phrase "结构理解先用 `list/page`" contradicts the updated process that now recommends
starting with `chapter tree --json`. Update this line to replace the reference
to `list/page` with `chapter tree --json` to align with the current structure
understanding approach and maintain consistency throughout the documentation.

In `@README.md`:
- Line 188: The exploration mode guidance on line 188 is outdated and conflicts
with the new structure-first flow requirement. Update the sentence to indicate
that `chapter tree --json` should be the first step for structural
understanding, then use `list/page` for further refinement, and keep the
guidance about `find/grep` for candidate discovery and `read` for continuous
prose. This will align the exploration mode instructions with the updated
structure-first contract.

---

Nitpick comments:
In `@src/cli/archive.ts`:
- Around line 368-380: In the createStageAdvanceProgressWriter function, add a
comment above the outputCharactersPerToken variable assignment explaining that
the hardcoded value of 4 is a rough English-language average for
character-to-token conversion, and that this approximation can vary
significantly depending on the model being used (different tokenizers), the
language of the content (Chinese/Japanese typically have fewer characters per
token), and the type of content being tokenized (code vs markdown vs plain
text). This documents the assumption for future maintainers and acknowledges the
limitation of this estimate.

In `@src/facade/chapter.ts`:
- Around line 593-605: The word counting in the loop over fragmentIds is
performed sequentially, fetching and processing each fragment one at a time,
which can be slow for large collections. Replace the synchronous for loop that
calls serialFragments.getFragment(fragmentId) with a Promise.all() approach that
fetches all fragments in parallel, then accumulates the word counts from all
fetched fragments concurrently. This will maintain the same functionality while
improving performance for archives with many fragments.

In `@src/llm/client.ts`:
- Around line 375-385: The `#emitStreamProgress` method has an empty catch block
that silently suppresses any errors thrown by the `#onStreamProgress` callback,
making it difficult to debug callback implementation issues. Add error logging
to the catch block to capture and log the error before returning, ensuring that
callback failures are visible for debugging while still preventing the stream
from being interrupted by callback errors.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a00a0ff3-5b58-49ac-9328-80031b72c6fe

📥 Commits

Reviewing files that changed from the base of the PR and between a02ba7b and 4ce65ad.

📒 Files selected for processing (20)

README.md
README_zh-CN.md
data/help/commands/root.jinja
data/help/topics/ai.jinja
data/help/topics/command.jinja
data/help/topics/format.jinja
data/help/topics/index.jinja
data/help/topics/overview.jinja
data/help/topics/task.jinja
docs/en/ai-agents.md
docs/en/cli.md
docs/zh-CN/ai-agents.md
docs/zh-CN/cli.md
src/cli/archive.ts
src/cli/stage-runtime.ts
src/facade/chapter.ts
src/llm/client.ts
src/llm/index.ts
src/llm/types.ts
src/serial.ts

Moskize91 added 2 commits June 17, 2026 14:52

feat: improve build progress output

5c5febe

docs: route agents to chapter tree

4ce65ad

Moskize91 merged commit ec919d4 into main Jun 17, 2026
2 of 3 checks passed

Moskize91 deleted the feat/build-progress-line branch June 17, 2026 07:43

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

coderabbitai Bot mentioned this pull request Jun 18, 2026

Add sdpub coordination and queue workflow #65

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve build progress output#62

feat: improve build progress output#62
Moskize91 merged 2 commits into
mainfrom
feat/build-progress-line

Moskize91 commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Moskize91 commented Jun 17, 2026

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Possibly Related PRs

❌ Failed checks (1 inconclusive)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading