Skip to content

feat: add Rust postprocess kernel#37

Merged
MapleEve merged 1 commit into
mainfrom
feat/0.8.2-postprocess-rust-kernel
Jun 9, 2026
Merged

feat: add Rust postprocess kernel#37
MapleEve merged 1 commit into
mainfrom
feat/0.8.2-postprocess-rust-kernel

Conversation

@MapleEve

@MapleEve MapleEve commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a pure Python postprocess oracle for result segment assembly, display-name disambiguation, conservative text-only segment merge, and JSON-safe word normalization.
  • Add the matching Rust postprocess kernel and Python bridge entrypoint for RUST_KERNEL_MODE=required, while keeping the default Python path.
  • Extend Docker heavy-gate smoke, changelog, configuration docs, and unit coverage for the new selected Rust-backed path.

Validation

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest tests/unit/ tests/test_security.py tests/test_voiceprint_db.py tests/test_job_service.py -q --tb=short
  • cargo fmt --manifest-path crates/voscript_core/Cargo.toml -- --check
  • cargo test --manifest-path crates/voscript_core/Cargo.toml
  • cargo clippy --manifest-path crates/voscript_core/Cargo.toml --features python-bindings --all-targets -- -D warnings
  • ruff format --check app/ tests/unit/test_kernel_bridge.py tests/unit/test_pipeline_alignment.py tests/unit/test_pipeline_runner.py tests/unit/test_postprocess_segments_kernel.py
  • ruff check app/ tests/unit/test_kernel_bridge.py tests/unit/test_pipeline_alignment.py tests/unit/test_pipeline_runner.py tests/unit/test_postprocess_segments_kernel.py --ignore E501
  • python voscript-api/scripts/public_release_scan.py --root <repo>
  • git diff --check

Notes

  • Public HTTP schema stays unchanged.
  • speaker_label remains the stable cluster key; duplicate display names are disambiguated, not merged.
  • segments[].words remains optional.

Copilot AI review requested due to automatic review settings June 9, 2026 15:30
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown

Claude encountered an error after 0s —— View job


I'll analyze this and get back to you.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a deterministic, side-effect-free transcript post-processing “oracle” in Python and a matching Rust kernel (exposed via the existing providers.kernel_bridge) so RUST_KERNEL_MODE=required runs can hard-require native post-processing while keeping the default Python execution path.

Changes:

  • Added pure Python post-processing helpers for segment assembly, display-name disambiguation, conservative text-only merging, and JSON-safe word normalization.
  • Added a Rust post-processing implementation plus PyO3 bindings and Python bridge validation/entrypoint.
  • Extended unit tests, docs/changelog, environment/config documentation, and Docker heavy-gate smoke to cover the new Rust-backed path.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/test_postprocess_segments_kernel.py New golden tests for the Python postprocess oracle behaviors (normalization, merging, display names, segment assembly).
tests/unit/test_pipeline_runner.py Adds coverage ensuring artifacts provider selects Rust postprocess when RUST_KERNEL_MODE=required.
tests/unit/test_pipeline_alignment.py Formatting-only updates to existing assertions.
tests/unit/test_kernel_bridge.py Updates expected core version and adds bridge tests for postprocess_segments validation.
doc/configuration.zh.md Documents that RUST_KERNEL_MODE now also selects result post-processing.
doc/configuration.en.md Same as above (English).
doc/changelog.zh.md Changelog entry for Rust-backed result post-processing + related coverage updates.
doc/changelog.en.md Same as above (English).
crates/voscript_core/src/postprocess.rs New Rust implementation for merging/normalizing aligned segments and building result segments.
crates/voscript_core/src/lib.rs Exposes Rust postprocess via PyO3 (postprocess_segments) and parsing helpers.
crates/voscript_core/Cargo.toml Bumps voscript_core version to 0.8.2.
Cargo.lock Locks the updated voscript_core version.
app/providers/kernel_bridge/runtime.py Adds Rust postprocess response validation and Python bridge entrypoint postprocess_segments.
app/providers/kernel_bridge/init.py Re-exports postprocess_segments.
app/providers/artifacts/default.py Switches segment assembly to Python oracle by default and Rust kernel when required.
app/postprocess/segments.py New Python oracle module for post-processing logic used by both pipeline and Rust equivalence tests.
app/postprocess/init.py Exposes the postprocess helpers as a package API.
app/pipeline/stages/diarization/alignment.py Reuses the shared normalize_words helper from the new postprocess module.
.github/workflows/rust-foundation-heavy.yml Extends Docker heavy-gate smoke to exercise postprocess_segments under required mode.
.env.example Notes that required Rust mode now covers result post-processing in addition to voiceprint scoring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +145 to +148
let word = match dict.get_item("word")? {
Some(value) if !value.is_none() => value.str()?.to_string(),
_ => String::new(),
};
Comment on lines 11 to 30
@@ -24,60 +26,22 @@ def _build_display_names(
speaker_labels: list[str],
speaker_map: dict[str, dict],
) -> dict[str, str]:
labels_by_name: dict[str, list[str]] = {}

for speaker_label in speaker_labels:
match = speaker_map.get(speaker_label, {})
speaker_name = str(match.get("matched_name") or speaker_label)
labels_by_name.setdefault(speaker_name, []).append(speaker_label)

display_names: dict[str, str] = {}
for speaker_name, labels in labels_by_name.items():
for index, speaker_label in enumerate(labels, start=1):
display_names[speaker_label] = (
speaker_name if index == 1 else f"{speaker_name} ({index})"
)
return display_names
return build_display_names(speaker_labels, speaker_map)

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.56790% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.22%. Comparing base (259f0bb) to head (e8dbeaa).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
app/providers/kernel_bridge/runtime.py 68.05% 23 Missing ⚠️
app/postprocess/segments.py 98.75% 1 Missing ⚠️
app/providers/artifacts/default.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #37      +/-   ##
==========================================
- Coverage   91.59%   91.22%   -0.38%     
==========================================
  Files          79       81       +2     
  Lines        3333     3464     +131     
==========================================
+ Hits         3053     3160     +107     
- Misses        280      304      +24     
Flag Coverage Δ
unit 91.22% <84.56%> (-0.38%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@MapleEve MapleEve merged commit 8b41265 into main Jun 9, 2026
10 of 12 checks passed
@MapleEve MapleEve deleted the feat/0.8.2-postprocess-rust-kernel branch June 9, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants