refactor: unified translation pipeline with OpenRouter, retry, and coverage dashboard by xiaoyu2er · Pull Request #83 · xiaoyu2er/nextjs-i18n-docs

xiaoyu2er · 2026-03-19T18:02:31Z

Summary

Bug fix: heading hash collision — ## Title and ### Title now produce distinct MD5 hashes (previously identical, causing wrong heading levels in cached translations)
Bug fix: cache.save() preserves src field — source location tracking no longer lost on save
Bug fix: validator node count guard — if the LLM merges/splits paragraphs, cache update is skipped to prevent silent corruption
Feature: OpenRouter integration — default API backend, configured via .env
Feature: retry with exponential backoff — 3 attempts (2s/4s/8s) for rate limits, timeouts, and provider outages
Feature: truncation detection — checks finish_reason=length to prevent caching half-translated content
Feature: unified CLI — --status for coverage report, --lang for translation, --dry-run, --concurrency, --max-tokens
Feature: coverage dashboard — per-language translation progress (cached/missing/coverage%)
Cleanup: deleted 10 legacy files (~1,800 lines) — removed dead main.ts/openai.ts path
Cleanup: removed unused deps — commander, cosmiconfig, gray-matter, micromatch, @anthropic-ai/sdk

Test Coverage

76 tests pass (8 test files, 0 failures)
Updated parser test: heading levels produce different hashes
Added cache test: src field round-trip through save/load
Added validator test: node count mismatch guard
Added translator test: stripThinkingBlock
Deleted 4 legacy test files (chunk, config, logger, usage)
Fixed integration test paths: apps/docs/content/en → content/en

Usage

# Check translation coverage
bun run packages/translate/src/batch.pipeline.ts --status --docs-root content-v15/en

# Translate v15 to zh-hans
bun run packages/translate/src/batch.pipeline.ts --docs-root content-v15/en --lang zh-hans --output-dir content-v15

# Dry run
bun run packages/translate/src/batch.pipeline.ts --docs-root content-v15/en --lang zh-hans --dry-run

TODOS

Phase 2: translate v15 to remaining 7 languages (after zh-hans quality validated)
Update translation config docsContext reference

🤖 Generated with Claude Code

…verage dashboard Bug fixes: - Heading hash collision: different heading levels now produce distinct MD5 hashes - cache.save() preserves src field for source location tracking - Validator guards against node count mismatch (LLM merges/splits nodes) - Default docs-root fixed to content/en Features: - OpenRouter integration as default API (configurable via .env) - Retry with exponential backoff (3 attempts, 2s/4s/8s) - Truncation detection via finish_reason check - Unified CLI: --status, --lang, --dry-run, --concurrency, --max-tokens - Coverage dashboard: per-language translation progress report - .env / .env.example for API key management Cleanup: - Deleted 10 legacy files (~1,800 lines): main.ts, openai.ts, utils.ts, config.ts, index.ts, chunk.ts, pipeline-demo.ts, pipeline.ts, usage.ts, logger.ts - Deleted 4 legacy test files - Removed unused dependencies: commander, cosmiconfig, gray-matter, micromatch, @anthropic-ai/sdk Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…est free model for zh-hans)

cloudflare-workers-and-pages · 2026-03-19T18:12:29Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
❌ Deployment failed View logs	nextjs-docs-latest	`41d7475`	Mar 21 2026, 10:08 PM

When the LLM merges/splits/drops nodes during translation, the validator now uses cached translations as anchor points to align source and output nodes instead of skipping all cache updates. Also adds 405 to retryable errors (OpenRouter free models). Includes 42 translated zh-hans files from prior runs.

Previously --max 10 took the first 10 files alphabetically, even if all 10 were already cached. Now it scans all files, skips cached ones, and limits only the number of files sent to the API.

…ies)

…(+27,559 entries)

Injects  before each translatable node in output files. Skips frontmatter to avoid breaking YAML parsing. Enables searching MD5 hashes directly in translated files.

…- -->)

…slated text MD5

… path

…duplicate MD5 assignment

…mple.mjs

…te only

… node alignment Instead of writing the LLM's raw output (which may have extra blank lines or split nodes), we now: 1. Translate → validate → update cache 2. Re-assemble from EN source + updated cache → write file This guarantees output structure matches English source exactly.

When all nodes are cached after translation → use re-assembled output (structure matches EN exactly). When some nodes are uncached (anchor couldn't align them) → use LLM's original output to avoid English text in translated file. Logs a warning.

Key additions: - Preserve blank lines exactly (main cause of mismatch) - Never remove blank line between paragraph and code block - Never merge paragraphs - Count paragraphs must match input

… node logging

xiaoyu2er and others added 2 commits March 19, 2026 11:02

chore: update default model to stepfun/step-3.5-flash:free (tested, b…

b05cde5

…est free model for zh-hans)

xiaoyu2er added 27 commits March 19, 2026 13:17

fix: skip translation when all files are fully cached

0b8b4a0

fix: --max now limits files to translate, not files to scan

fddb16d

Previously --max 10 took the first 10 files alphabetically, even if all 10 were already cached. Now it scans all files, skips cached ones, and limits only the number of files sent to the API.

chore: backfill zh-hans cache from existing translations (+2,522 entr…

035328a

…ies)

fix: include file path in node count mismatch warning

d15ef55

chore: backfill cache for all 8 languages from existing translations …

dce1d04

…(+27,559 entries)

chore: delete en.jsonl (replaced by --lookup), add --lookup command

307c88c

feat: annotate translated files with md5 comments for quick lookup

f2e692a

Injects  before each translatable node in output files. Skips frontmatter to avoid breaking YAML parsing. Enables searching MD5 hashes directly in translated files.

feat: add --annotate command to add md5 comments to existing files

2a52d7b

fix: use MDX-compatible comments for md5 annotations ({/* */} not <!-…

8df2d61

…- -->)

fix: md5 annotations now use English source MD5 (cache key), not tran…

5bb1c69

…slated text MD5

fix: --annotate auto-detects English source by replacing lang code in…

3d9798f

… path

fix: annotate uses cache anchors for mismatched node counts, prevent …

84a3b69

…duplicate MD5 assignment

chore: delete 83 mismatched zh-hans translation files for re-translation

0caecfe

fix: default config path to packages/translate/translation.config.exa…

13fc348

…mple.mjs

fix: remove auto-annotation from translate pipeline, keep as --annota…

37d59c8

…te only

chore: remove all md5 annotations from translated files

1766ea7

chore: remove LLM-generated md5 annotations from translated files

b179ed2

fix: auto-backfill cache from LLM output when anchor alignment has gaps

00da095

feat: add --repair mode to fix mismatched files without re-translating

eea39cb

fix: add structure preservation rules to translation prompt

2b2a31b

Key additions: - Preserve blank lines exactly (main cause of mismatch) - Never remove blank line between paragraph and code block - Never merge paragraphs - Count paragraphs must match input

feat: json-based translation mode to eliminate structure mismatch

98e9b97

fix: remove leftover NEEDS_TRANSLATION markers from translated files

3dc9801

fix: repair broken JSON from LLM (unescaped newlines), better missing…

7a08f83

… node logging

fix: delete 24 files with broken YAML frontmatter for re-translation

4d2a32f

xiaoyu2er added 30 commits March 21, 2026 09:40

style: stronger gutter border for consistent visibility

73c7210

style: consistent 1px border, keep gutter opaque in gap rows

5e11048

refactor: unify frontmatter in translateJsonChunk

8f9e533

feat: context menu with copy MD5 and delete cache

13df3cf

fix: import getCache in status routes

171c750

refactor: always use md5 mode, remove file/md5 toggle

aca1ef1

fix: remove stale mode reference from JobDialog

c1d906b

fix: remove remaining mode reference in JobDialog

df6e6c1

feat: scope MD5 translation to selected files via --files

be9d6d4

fix: show files filter in MD5 mode output

61d0479

feat: translate related.title and related.description in frontmatter

e869cbf

fix: update thematicBreak test, add frontmatter test

80cb7eb

feat: auto assemble + prepare-content after translation job completes

f6153dc

fix: add locale prefix to related links in MarkdownContent

439ac85

feat: click filename in preview to open in VS Code

9f70b32

fix: use absolute path for vscode:// link, guard against undefined

5dfe6bd

fix: correct projectRoot path for vscode:// links

a672d31

refactor: open file via server API instead of vscode:// protocol

117f2d8

fix: auto-detect editor (code/cursor/zed), fallback to OS open

f35ef7b

feat: copy button for job logs

bd05245

feat: incremental prepare-content, only write changed files

41d7475

feat: integrate docs-i18n npm package, add i18n:* scripts

9d7f0df

chore: simplify i18n scripts to use bin name

2cc74a2

chore: update docs-i18n to 0.2.2

67c1d3b

chore: update docs-i18n to 0.2.3

bc4c4f1

chore: update docs-i18n to 0.2.4

f8372f4

chore: update docs-i18n to 0.3.2, add vite as devDep

80b4b28

feat: pass OPENROUTER_API_KEY from env to docs-i18n config

0c58920

chore: update docs-i18n to 0.4.0

f768c89

chore: update docs-i18n to 0.4.1

cc9c549

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: unified translation pipeline with OpenRouter, retry, and coverage dashboard#83

refactor: unified translation pipeline with OpenRouter, retry, and coverage dashboard#83
xiaoyu2er wants to merge 246 commits into
mainfrom
feat/unified-translate-pipeline

xiaoyu2er commented Mar 19, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaoyu2er commented Mar 19, 2026

Summary

Test Coverage

Usage

TODOS

Uh oh!

cloudflare-workers-and-pages Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Mar 19, 2026 •

edited

Loading