feat: improve build progress output#62
Conversation
Summary by CodeRabbit
WalkthroughThe PR makes two independent sets of changes. First, all documentation (README files, Jinja help templates, and Markdown docs in English and Chinese) is updated to insert Second, a streaming progress pipeline is added to the code: Possibly Related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
README.md (1)
188-188:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate AI-agent mode guidance to match the new structure-first flow.
Line 188 still says to start structural understanding with
list/page, which now conflicts with the updated structure-mode contract (chapter tree --jsonfirst). Please align this line to avoid mixed instructions in the same README.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 188, The exploration mode guidance on line 188 is outdated and conflicts with the new structure-first flow requirement. Update the sentence to indicate that `chapter tree --json` should be the first step for structural understanding, then use `list/page` for further refinement, and keep the guidance about `find/grep` for candidate discovery and `read` for continuous prose. This will align the exploration mode instructions with the updated structure-first contract.README_zh-CN.md (1)
188-188:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win将 AI Agent 指引同步为“先 chapter tree --json”流程。
Line 188 仍写“结构理解先用
list/page”,与本次已更新的结构模式(先chapter tree --json)不一致,建议统一口径避免歧义。🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README_zh-CN.md` at line 188, The guidance text on line 188 in the README_zh-CN.md file contains outdated information about the exploration mode process. Specifically, the phrase "结构理解先用 `list/page`" contradicts the updated process that now recommends starting with `chapter tree --json`. Update this line to replace the reference to `list/page` with `chapter tree --json` to align with the current structure understanding approach and maintain consistency throughout the documentation.
🧹 Nitpick comments (3)
src/llm/client.ts (1)
375-385: ⚡ Quick winConsider logging suppressed callback errors.
The empty
catchblock silently suppresses all errors thrown by theonStreamProgresscallback. While this prevents callback failures from breaking the stream, it also hides programming errors in callback implementations, making debugging difficult.Consider logging the error before returning:
try { await this.#onStreamProgress({ outputCharacters }); } catch { - return; + // Log error but don't propagate to avoid breaking the stream + getLogger({ component: "llm" }).warn("Stream progress callback failed", error); + return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/client.ts` around lines 375 - 385, The `#emitStreamProgress` method has an empty catch block that silently suppresses any errors thrown by the `#onStreamProgress` callback, making it difficult to debug callback implementation issues. Add error logging to the catch block to capture and log the error before returning, ensuring that callback failures are visible for debugging while still preventing the stream from being interrupted by callback errors.src/facade/chapter.ts (1)
593-605: ⚖️ Poor tradeoffConsider performance impact of synchronous word counting.
This code iterates all fragments and sentences synchronously to compute word counts for each chapter. For archives with many chapters and large fragment collections, this could introduce noticeable latency in
listChapters().Consider these options:
- Cache word counts when fragments are created/modified (store alongside fragment records)
- Parallelize the fragment iteration using
Promise.all()if the order doesn't matter- Profile actual performance with representative data to confirm if optimization is needed
Since this follows the existing pattern in the file and words are only computed during advancement (not on every read), the current implementation may be acceptable. However, it's worth monitoring if users report slow chapter listings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/facade/chapter.ts` around lines 593 - 605, The word counting in the loop over fragmentIds is performed sequentially, fetching and processing each fragment one at a time, which can be slow for large collections. Replace the synchronous for loop that calls serialFragments.getFragment(fragmentId) with a Promise.all() approach that fetches all fragments in parallel, then accumulates the word counts from all fetched fragments concurrently. This will maintain the same functionality while improving performance for archives with many fragments.src/cli/archive.ts (1)
368-380: 💤 Low valueDocument the token estimation assumption.
The hardcoded
outputCharactersPerToken = 4(line 380) is a rough approximation that can vary significantly:
- By model: GPT-4 vs Claude vs other models have different tokenizers
- By language: Chinese/Japanese typically have fewer characters per token (~1-2), while English averages ~4
- By content: Code, markdown, and plain text tokenize differently
While this is only used for display (not billing), the estimated token count might be confusing or misleading to users working with non-English text or specific models.
Consider:
- Document the assumption in a comment explaining it's an English-language average
- Make it configurable via CLI flag or config file if needed
- Use model-specific defaults if the model provider/ID is known
For now, adding a comment would improve maintainability:
- const outputCharactersPerToken = 4; + // Rough estimate: ~4 chars/token for English text with common LLMs + // Actual ratio varies by model, language, and content type + const outputCharactersPerToken = 4;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/archive.ts` around lines 368 - 380, In the createStageAdvanceProgressWriter function, add a comment above the outputCharactersPerToken variable assignment explaining that the hardcoded value of 4 is a rough English-language average for character-to-token conversion, and that this approximation can vary significantly depending on the model being used (different tokenizers), the language of the content (Chinese/Japanese typically have fewer characters per token), and the type of content being tokenized (code vs markdown vs plain text). This documents the assumption for future maintainers and acknowledges the limitation of this estimate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@README_zh-CN.md`:
- Line 188: The guidance text on line 188 in the README_zh-CN.md file contains
outdated information about the exploration mode process. Specifically, the
phrase "结构理解先用 `list/page`" contradicts the updated process that now recommends
starting with `chapter tree --json`. Update this line to replace the reference
to `list/page` with `chapter tree --json` to align with the current structure
understanding approach and maintain consistency throughout the documentation.
In `@README.md`:
- Line 188: The exploration mode guidance on line 188 is outdated and conflicts
with the new structure-first flow requirement. Update the sentence to indicate
that `chapter tree --json` should be the first step for structural
understanding, then use `list/page` for further refinement, and keep the
guidance about `find/grep` for candidate discovery and `read` for continuous
prose. This will align the exploration mode instructions with the updated
structure-first contract.
---
Nitpick comments:
In `@src/cli/archive.ts`:
- Around line 368-380: In the createStageAdvanceProgressWriter function, add a
comment above the outputCharactersPerToken variable assignment explaining that
the hardcoded value of 4 is a rough English-language average for
character-to-token conversion, and that this approximation can vary
significantly depending on the model being used (different tokenizers), the
language of the content (Chinese/Japanese typically have fewer characters per
token), and the type of content being tokenized (code vs markdown vs plain
text). This documents the assumption for future maintainers and acknowledges the
limitation of this estimate.
In `@src/facade/chapter.ts`:
- Around line 593-605: The word counting in the loop over fragmentIds is
performed sequentially, fetching and processing each fragment one at a time,
which can be slow for large collections. Replace the synchronous for loop that
calls serialFragments.getFragment(fragmentId) with a Promise.all() approach that
fetches all fragments in parallel, then accumulates the word counts from all
fetched fragments concurrently. This will maintain the same functionality while
improving performance for archives with many fragments.
In `@src/llm/client.ts`:
- Around line 375-385: The `#emitStreamProgress` method has an empty catch block
that silently suppresses any errors thrown by the `#onStreamProgress` callback,
making it difficult to debug callback implementation issues. Add error logging
to the catch block to capture and log the error before returning, ensuring that
callback failures are visible for debugging while still preventing the stream
from being interrupted by callback errors.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a00a0ff3-5b58-49ac-9328-80031b72c6fe
📒 Files selected for processing (20)
README.mdREADME_zh-CN.mddata/help/commands/root.jinjadata/help/topics/ai.jinjadata/help/topics/command.jinjadata/help/topics/format.jinjadata/help/topics/index.jinjadata/help/topics/overview.jinjadata/help/topics/task.jinjadocs/en/ai-agents.mddocs/en/cli.mddocs/zh-CN/ai-agents.mddocs/zh-CN/cli.mdsrc/cli/archive.tssrc/cli/stage-runtime.tssrc/facade/chapter.tssrc/llm/client.tssrc/llm/index.tssrc/llm/types.tssrc/serial.ts
No description provided.