refactor: unified translation pipeline with OpenRouter, retry, and coverage dashboard#83
Open
xiaoyu2er wants to merge 246 commits into
Open
refactor: unified translation pipeline with OpenRouter, retry, and coverage dashboard#83xiaoyu2er wants to merge 246 commits into
xiaoyu2er wants to merge 246 commits into
Conversation
…verage dashboard Bug fixes: - Heading hash collision: different heading levels now produce distinct MD5 hashes - cache.save() preserves src field for source location tracking - Validator guards against node count mismatch (LLM merges/splits nodes) - Default docs-root fixed to content/en Features: - OpenRouter integration as default API (configurable via .env) - Retry with exponential backoff (3 attempts, 2s/4s/8s) - Truncation detection via finish_reason check - Unified CLI: --status, --lang, --dry-run, --concurrency, --max-tokens - Coverage dashboard: per-language translation progress report - .env / .env.example for API key management Cleanup: - Deleted 10 legacy files (~1,800 lines): main.ts, openai.ts, utils.ts, config.ts, index.ts, chunk.ts, pipeline-demo.ts, pipeline.ts, usage.ts, logger.ts - Deleted 4 legacy test files - Removed unused dependencies: commander, cosmiconfig, gray-matter, micromatch, @anthropic-ai/sdk Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…est free model for zh-hans)
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ❌ Deployment failed View logs |
nextjs-docs-latest | 41d7475 | Mar 21 2026, 10:08 PM |
When the LLM merges/splits/drops nodes during translation, the validator now uses cached translations as anchor points to align source and output nodes instead of skipping all cache updates. Also adds 405 to retryable errors (OpenRouter free models). Includes 42 translated zh-hans files from prior runs.
Previously --max 10 took the first 10 files alphabetically, even if all 10 were already cached. Now it scans all files, skips cached ones, and limits only the number of files sent to the API.
…(+27,559 entries)
Injects <!-- md5:hash --> before each translatable node in output files. Skips frontmatter to avoid breaking YAML parsing. Enables searching MD5 hashes directly in translated files.
…duplicate MD5 assignment
… node alignment Instead of writing the LLM's raw output (which may have extra blank lines or split nodes), we now: 1. Translate → validate → update cache 2. Re-assemble from EN source + updated cache → write file This guarantees output structure matches English source exactly.
When all nodes are cached after translation → use re-assembled output (structure matches EN exactly). When some nodes are uncached (anchor couldn't align them) → use LLM's original output to avoid English text in translated file. Logs a warning.
Key additions: - Preserve blank lines exactly (main cause of mismatch) - Never remove blank line between paragraph and code block - Never merge paragraphs - Count paragraphs must match input
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
## Titleand### Titlenow produce distinct MD5 hashes (previously identical, causing wrong heading levels in cached translations).envfinish_reason=lengthto prevent caching half-translated content--statusfor coverage report,--langfor translation,--dry-run,--concurrency,--max-tokensmain.ts/openai.tspathTest Coverage
apps/docs/content/en→content/enUsage
TODOS
🤖 Generated with Claude Code