Refactor architecture validation gates#43
Conversation
|
Thank you for raising your pull request and contributing to VoScript. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #43 +/- ##
==========================================
- Coverage 90.90% 90.52% -0.39%
==========================================
Files 84 95 +11
Lines 3696 4359 +663
==========================================
+ Hits 3360 3946 +586
- Misses 336 413 +77
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
|
Claude encountered an error after 1m 38s —— View job Code Review in Progress
|
There was a problem hiding this comment.
Pull request overview
本 PR 以“架构 ring/边界 + 发布/CI gate 加固”为主线,重构了转写上传与持久化读写的分层(application/infra),并将 Rust kernel 默认姿态切换为 required,同时补齐了对齐(WhisperX)离线/缓存模式、资源预算(磁盘/并发/超长音频)与公共文档/契约的一致性验证。
Changes:
- 将转写上传提交、record 读取/导出、admission(active/in-flight/磁盘)策略下沉到
app/application/*,并新增/强化infra仓储与运行时 job API。 - 强化对齐/降噪/embedding 的“时长预算”短路路径,修复 WhisperX cache-only 场景下已导入 HF/Transformers 模块仍可能联网的问题。
- 加固 CI/Release 工作流:引入精确 source ref 解析、新增 architecture/docs drift gate、重跑 heavy gate(含 synchronize),并把默认运行姿态与文档升级到 0.8.5 /
RUST_KERNEL_MODE=required。
Reviewed changes
Copilot reviewed 100 out of 101 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_voiceprint_db.py | 更新并发 dedup 测试以使用 application/infra 新边界 |
| tests/unit/test_transcription_submission.py | 新增 upload submission 用例的失败路径与 ring 约束测试 |
| tests/unit/test_transcription_records.py | 新增 record 读取/导出/纠错与“infra 仓储边界”测试 |
| tests/unit/test_provider_registry.py | 验证 providers facade 与 pipeline.registry 的职责分离与 override 语义 |
| tests/unit/test_provider_capabilities.py | 新增 capability 静态元数据与 alignment capability 归属验证 |
| tests/unit/test_pipeline_runner.py | runner 增加 capability 预检与公开安全元数据契约测试 |
| tests/unit/test_kernel_release_gates.py | 加固 release/heavy gate 断言与默认 Rust required 姿态 |
| tests/unit/test_kernel_bridge.py | Rust bridge 版本更新与 rollback 语义测试更新 |
| tests/unit/test_job_runtime.py | 新增 runtime job store API 与 admission 计数/原子性测试 |
| tests/unit/test_docs_code_drift_gate.py | 新增 docs/code drift gate 回归测试 |
| tests/unit/test_config_defaults.py | 默认 RUST_KERNEL_MODE=required 与 compose/docs 一致性断言 |
| tests/unit/test_audio_layers.py | 增加音频时长元数据与降噪/normalize typed error 行为测试 |
| tests/unit/test_artifact_status_schema_contracts.py | status helper 下沉到 infra,并新增 pipeline metadata/public alignment 归一化契约测试 |
| tests/unit/test_api_route_coverage.py | API 行为覆盖更新:records/submission、admission 503、输入校验与 voiceprints 路径错误映射 |
| tests/unit/test_admission.py | 新增 admission 策略的单元测试(active/in-flight/磁盘) |
| tests/test_voiceprint_db.py | 旧版 DB 测试显式 pin 到 Python 语义(关闭 rust paths) |
| tests/test_security.py | 安全回归:模块 reload 范围、错误信息断言与 submission/jobs 迁移 |
| tests/e2e/test_api_core.py | E2E 默认不再 fallback(需显式开关) |
| docker-compose.yml | 新增 admission/预算与对齐/embedding/降噪预算 env,并默认 Rust required |
| doc/security.zh.md | 文档版本号更新至 0.8.5 |
| doc/security.en.md | 文档版本号更新至 0.8.5 |
| doc/quickstart.zh.md | 更新 noisereduce gate 文案与预算项说明 |
| doc/quickstart.en.md | 更新 noisereduce gate 文案与预算项说明 |
| doc/configuration.zh.md | 配置索引升级到 0.8.5,新增 admission/预算与 Rust required 默认说明 |
| doc/configuration.en.md | 配置索引升级到 0.8.5,新增 admission/预算与 Rust required 默认说明 |
| doc/changelog.zh.md | 新增 0.8.5 changelog(Rust required 默认、CI gate、E2E 严格化等) |
| doc/changelog.en.md | 新增 0.8.5 changelog(Rust required 默认、CI gate、E2E 严格化等) |
| doc/api.zh.md | API 文档补齐 audio 下载/voiceprints get 与 admission 503 文案、对齐字段调整 |
| doc/api.en.md | API 文档补齐 audio 下载/voiceprints get 与 admission 503 文案、对齐字段调整 |
| crates/voscript_core/src/lib.rs | Rust 包版本断言更新到 0.8.5 |
| crates/voscript_core/Cargo.toml | Rust crate 版本升级到 0.8.5 |
| CLAUDE.md | 增加仓库写作语言与证据保留规则 |
| Cargo.lock | Rust crate 锁文件版本更新到 0.8.5 |
| app/providers/voiceprint_match/init.py | providers facade 禁止命名选择(仅 default),引导使用 PipelineRunner/registry |
| app/providers/vad/init.py | 同上:facade 只允许 default |
| app/providers/punc/init.py | 同上:facade 只允许 default |
| app/providers/postprocess/init.py | 同上:facade 只允许 default |
| app/providers/normalize/default.py | normalize 超时改为 typed error(脱离 FastAPI) |
| app/providers/normalize/init.py | normalize facade 只允许 default |
| app/providers/kernel_bridge/release_gates.py | release gate 集合/文案与 python owner 路径更新 |
| app/providers/ingest/init.py | ingest facade 只允许 default |
| app/providers/enhance/default.py | 增加降噪时长预算短路,避免超长音频全量加载 |
| app/providers/enhance/init.py | enhance facade 只允许 default |
| app/providers/embedding/default.py | embedding 增加全量 preload 时长预算,长音频改为按 turn 分段读取 |
| app/providers/embedding/init.py | embedding facade 只允许 default |
| app/providers/diarization/default.py | WhisperX cache-only 修复(同步模块 offline flag)、alignment 时长预算 short-circuit |
| app/providers/diarization/init.py | diarization facade 只允许 default |
| app/providers/capabilities.py | 引入静态 capability 元数据、alias 规范化与 NotFound 错误类型 |
| app/providers/asr/init.py | asr facade 只允许 default |
| app/providers/artifacts/default.py | alignment 元数据做 public-safe 归一化后再写入 artifacts |
| app/providers/artifacts/init.py | artifacts facade 只允许 default |
| app/providers/_registry.py | 新增 facade 选择 guard(拒绝 provider_name!=default) |
| app/providers/init.py | providers 包改为惰性导出,避免重导出 pipeline registry helpers |
| app/postprocess/alignment.py | 抽出纯 alignment 后处理 helper(供 diarization 使用) |
| app/pipeline/step_keys.py | 抽出 canonical step key/alias 规范化 |
| app/pipeline/stages/voiceprint_match/init.py | stages 改为直接 resolve_provider 并调用 provider 接口 |
| app/pipeline/stages/vad/init.py | 同上:stage 直接 resolve_provider |
| app/pipeline/stages/punc/init.py | 同上:stage 直接 resolve_provider |
| app/pipeline/stages/postprocess/init.py | 同上:stage 直接 resolve_provider |
| app/pipeline/stages/normalize/init.py | 同上:stage 直接 resolve_provider 并使用 contract request |
| app/pipeline/stages/ingest/init.py | 同上:stage 直接 resolve_provider |
| app/pipeline/stages/enhance/init.py | 同上:stage 直接 resolve_provider 并使用 contract request |
| app/pipeline/stages/embedding/init.py | 同上:stage 直接 resolve_provider 并使用 contract request |
| app/pipeline/stages/diarization/alignment.py | 旧路径改为兼容性 re-export |
| app/pipeline/stages/diarization/init.py | diarization stage 归一化 alignment 元数据到 public-safe 子集 |
| app/pipeline/stages/asr/init.py | stage 直接 resolve_provider 并使用 contract request |
| app/pipeline/stages/artifacts/init.py | stage 直接 resolve_provider 并调用 provider.build |
| app/pipeline/runner.py | 新增 capability 预检/skip 元数据,并通过动态 import 调 infra cleanup |
| app/pipeline/registry.py | 使用 step_keys 规范化并新增 is_provider_override |
| app/pipeline/orchestrator.py | 通过动态 import 断开 pipeline->infra/provider 的静态依赖边 |
| app/pipeline/errors.py | pipeline lookup errors 抽到独立模块 |
| app/pipeline/contracts/requests.py | provider/stage key 规范化改用 step_keys |
| app/pipeline/contracts/normalize.py | 新增 normalize typed errors |
| app/pipeline/contracts/metadata.py | 新增 pipeline metadata ownership/public-safe alignment 元数据归一化契约 |
| app/pipeline/contracts/errors.py | 兼容性 re-export 到新 errors 模块 |
| app/pipeline/contracts/init.py | 汇总导出更新:新增 metadata/normalize errors,移除 status helper 直出 |
| app/infra/transcription_records.py | 新增转写 records 的文件系统仓储(status/result/audio) |
| app/infra/job_status.py | status payload contract helper 下沉到 infra |
| app/infra/job_runtime.py | runtime job store API、in-flight/active admission 原语与计数 helpers |
| app/infra/job_persistence.py | 增加 public 原子写适配器与 status 写/丢弃 helper |
| app/infra/audio/paths.py | 路径校验从 HTTPException 改为 typed error |
| app/infra/audio/metadata.py | 新增 audio duration 元数据读取 helper |
| app/infra/audio/errors.py | 新增音频路径 typed errors |
| app/infra/audio/init.py | 导出新增的 metadata/errors |
| app/Dockerfile | 默认要求 wheel(缺失 fail closed),并明确 rollback 语义 |
| app/config.py | 版本升级 0.8.5,默认 Rust required,并新增 admission/预算配置 |
| app/application/transcription_records.py | 新增 application records usecases(status/list/audio/export/纠错) |
| app/application/transcription_jobs.py | job 状态写入改走 infra API,并在 finally 释放 admission |
| app/application/admission.py | 新增 admission 策略实现(active/in-flight/磁盘) |
| app/api/routers/voiceprints.py | 音频路径 typed error 映射为 400 HTTPException |
| .github/workflows/rust-foundation-heavy.yml | heavy gate 引入 resolve-source,PR synchronize 触发,强制 exact ref |
| .github/workflows/release.yml | release 工作流拆分并加固:resolve-source、lint/unit/security、wheel artifact、docker smoke、exact-ref build/push |
| .github/workflows/ci.yml | 新增 architecture/docs drift gate,并更新 pip-audit ignore 列表 |
| .env.example | 默认 Rust required,并新增 admission/预算与对齐/embedding/降噪预算 env |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| posix_path = PurePosixPath(value) | ||
| windows_path = PureWindowsPath(value) | ||
| if ( | ||
| value in {".", ".."} | ||
| or posix_path.is_absolute() | ||
| or windows_path.is_absolute() | ||
| or posix_path.name != value | ||
| or windows_path.name != value | ||
| ): |
| } | ||
| transcriptions._write_status(job_id, "completed", filename=audio_path.name) | ||
| out_dir = transcriptions.TRANSCRIPTIONS_DIR / job_id | ||
| job_persistence._write_status(job_id, "completed", filename=audio_path.name) |

Summary
Validation
Remote ai-wan deployment is running image tag voscript:6725b41c540e-aiwan-full-rust with RUST_KERNEL_MODE=required.