Add keyframe animation pipeline and translation quality improvements#2
Add keyframe animation pipeline and translation quality improvements#2mrlucas679 wants to merge 6 commits into
Conversation
Animation engine (signs_library.js, avatar.js): - Add signWithFrames() factory, prebakeFrameQuats(), findFrame(), and slerpBetweenFrames() to support real motion-capture keyframe sequences - TransitionEngine.tick() now has a dual path: plays through keyframe curves when .frames is present, falls back to SLERP for existing signs - avatar.js respects per-sign .duration and wires sign-level NMM data (browLift, headNod, etc.) through existing setNMMs() automatically - All 284 hand-crafted signs are fully backward-compatible Data pipeline (convert_signs.py, scripts/): - convert_signs.py v2.0 extracts full frame sequences (not just peak frame) using angular-displacement keyframe selection (30fps → 8–12 keyframes) - scripts/record_signs.py: new MediaPipe Holistic webcam recorder that outputs keyframes in the same format as the converter - scripts/merge_sign_data.py: merge tool that generates signs_library_generated.js from real data without touching the source Translation (sasl_transformer/, backend/): - grammar_rules.py: 5 new SASL rules (classifiers, SASSes, topicalization, plurality via MANY, spatial loci) + uncertain token flag - models.py: GlossToken.uncertain field; TranslationResponse gains sign_coverage and fingerspelled_words - transformer.py: coverage scoring + conservative LLM retry when <70% of tokens have known signs - backend/main.py and WebSocket broadcast now surface sign_coverage and fingerspelled_words to the frontend Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds support for real keyframed sign animation (motion-capture/recorded sequences) and improves SASL translation quality/metadata surfaced to clients.
Changes:
- Added keyframe-capable sign factory/utilities and extended
TransitionEngineto play sign-internal frame sequences when available. - Introduced data tooling: upgraded
convert_signs.pyto output keyframes + duration, plus newrecord_signs.py(MediaPipe recorder) andmerge_sign_data.py(override generator). - Improved translation robustness: added new grammar rules, introduced
uncertaintokens, computedsign_coverage, and exposedfingerspelled_wordsvia REST/WebSocket.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/windows/deaf/avatar.js |
Applies sign-level NMM markers and uses sign-provided duration for hold time. |
signs_library.js |
Adds keyframe interpolation helpers and keyframe playback mode in TransitionEngine. |
scripts/record_signs.py |
New webcam recorder producing keyframe JSON compatible with the keyframe sign format. |
scripts/merge_sign_data.py |
New tool to generate a JS overrides layer for real-data frames/durations. |
sasl_transformer/transformer.py |
Adds coverage-based conservative retry + uncertain handling and coverage/fingerspell metadata. |
sasl_transformer/models.py |
Extends response/token models with uncertain, sign_coverage, fingerspelled_words. |
sasl_transformer/grammar_rules.py |
Adds new SASL grammar guidance and JSON output schema updates (uncertain flag). |
convert_signs.py |
Upgrades converter to extract all frames and downsample to keyframes with durations. |
backend/requirements.txt |
Adds MediaPipe/OpenCV deps for recording script usage. |
backend/main.py |
Surfaces sign_coverage and fingerspelled_words in API and WebSocket payloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // When the destination sign has real motion-capture keyframes, | ||
| // play through them before handing off to the inter-sign transition. | ||
| if (this._inSignPhase) { | ||
| this._signElapsed += deltaTime; | ||
| const signT = Math.min(this._signElapsed / this._signDuration, 1.0); | ||
| const { a, b, localT } = findFrame(this._to.frames, signT); | ||
| const pose = slerpBetweenFrames(a, b, this._easing(localT)); | ||
| if (signT >= 1.0) { | ||
| this._inSignPhase = false; // keyframe playback done; start inter-sign phase | ||
| this._elapsed = 0; |
There was a problem hiding this comment.
TransitionEngine keyframe mode currently plays destination sign frames immediately after begin(), before performing the from→to transition. After signT reaches 1.0 it resets _elapsed and falls back to the existing SLERP path, which will then transition from the previous sign to the destination sign’s start pose (toSign._Rq.start), effectively causing a jump/reversal (you’ll be at the last keyframe, then transition back toward the first). Keyframed signs should transition into the first keyframe first, then play through frames, then mark done (and optionally apply coarticulation near the end).
| // When the destination sign has real motion-capture keyframes, | |
| // play through them before handing off to the inter-sign transition. | |
| if (this._inSignPhase) { | |
| this._signElapsed += deltaTime; | |
| const signT = Math.min(this._signElapsed / this._signDuration, 1.0); | |
| const { a, b, localT } = findFrame(this._to.frames, signT); | |
| const pose = slerpBetweenFrames(a, b, this._easing(localT)); | |
| if (signT >= 1.0) { | |
| this._inSignPhase = false; // keyframe playback done; start inter-sign phase | |
| this._elapsed = 0; | |
| // For keyframed destination signs, first transition from the previous | |
| // sign into the destination sign's start pose, then play the keyframes. | |
| if (this._inSignPhase) { | |
| this._elapsed += deltaTime; | |
| const rawT = Math.min(this._elapsed / this._duration, 1.0); | |
| // Coarticulation: once past blendStart, also pull toward _next sign | |
| // This is what makes the avatar start reaching for the NEXT sign | |
| // before fully finishing the current one — matching real signer behaviour. | |
| let blendT; | |
| if (rawT >= this._blendStart && this._next) { | |
| const coarticProgress = (rawT - this._blendStart) / (1 - this._blendStart); | |
| const easedMain = this._easing(rawT); | |
| const coarticPull = Easing.easeInQuad(coarticProgress) * 0.25; | |
| blendT = Math.min(easedMain + coarticPull, 1.0); | |
| } else { | |
| blendT = this._easing(rawT); | |
| } | |
| // Phase 1: reach the destination sign's first pose. | |
| if (rawT < 1.0) { | |
| return this._interpolate(this._from, this._to, blendT); | |
| } | |
| // Phase 2: once the transition is complete, play the destination | |
| // sign's keyframes. Carry any overshoot from this tick into playback | |
| // so we don't lose time when the transition completes mid-frame. | |
| const signDelta = Math.max(this._elapsed - this._duration, 0); | |
| this._elapsed = this._duration; | |
| this._signElapsed += signDelta; | |
| const signT = Math.min(this._signElapsed / this._signDuration, 1.0); | |
| const { a, b, localT } = findFrame(this._to.frames, signT); | |
| const pose = slerpBetweenFrames(a, b, this._easing(localT)); | |
| if (signT >= 1.0) { | |
| this._inSignPhase = false; | |
| this._done = true; | |
| if (this._onComplete) this._onComplete(); |
| let lo = 0, hi = frames.length - 1; | ||
| while (lo < hi) { | ||
| const mid = (lo + hi) >> 1; | ||
| if (frames[mid].t < t) lo = mid + 1; | ||
| else hi = mid; | ||
| } | ||
| if (hi === 0) return { a: frames[0], b: frames[0], localT: 0 }; | ||
| const a = frames[hi - 1], b = frames[hi]; | ||
| const span = b.t - a.t; | ||
| const localT = span < 1e-6 ? 1.0 : (t - a.t) / span; |
There was a problem hiding this comment.
findFrame() can return a localT outside [0..1] if t is beyond the last frame’s t (e.g., if the final keyframe isn’t exactly t=1.0 due to rounding or user-provided data). That will drive slerpBetweenFrames() with extrapolated values. Consider clamping localT to [0..1] and/or explicitly handling t >= frames[frames.length-1].t to return the last frame pair with localT=1.
| let lo = 0, hi = frames.length - 1; | |
| while (lo < hi) { | |
| const mid = (lo + hi) >> 1; | |
| if (frames[mid].t < t) lo = mid + 1; | |
| else hi = mid; | |
| } | |
| if (hi === 0) return { a: frames[0], b: frames[0], localT: 0 }; | |
| const a = frames[hi - 1], b = frames[hi]; | |
| const span = b.t - a.t; | |
| const localT = span < 1e-6 ? 1.0 : (t - a.t) / span; | |
| const lastIndex = frames.length - 1; | |
| if (t <= frames[0].t) { | |
| return { a: frames[0], b: frames[0], localT: 0 }; | |
| } | |
| if (t >= frames[lastIndex].t) { | |
| return { a: frames[lastIndex], b: frames[lastIndex], localT: 1 }; | |
| } | |
| let lo = 0, hi = lastIndex; | |
| while (lo < hi) { | |
| const mid = (lo + hi) >> 1; | |
| if (frames[mid].t < t) lo = mid + 1; | |
| else hi = mid; | |
| } | |
| const a = frames[hi - 1], b = frames[hi]; | |
| const span = b.t - a.t; | |
| const localT = span < 1e-6 ? 1.0 : Math.max(0, Math.min(1, (t - a.t) / span)); |
| function signWithFrames(name, shape, desc, conf, frames, durationMs, nmm) { | ||
| if (!frames || frames.length < 2) { | ||
| throw new Error(`signWithFrames("${name}"): need at least 2 keyframes`); | ||
| } | ||
| prebakeFrameQuats(frames); | ||
| const first = frames[0]; | ||
| const last = frames[frames.length - 1]; | ||
| return { | ||
| name, shape, desc, conf, | ||
| frames, | ||
| duration: durationMs, | ||
| nmm: nmm || null, |
There was a problem hiding this comment.
signWithFrames() accepts durationMs but doesn’t validate it. TransitionEngine keyframe playback requires a positive duration (to compute _signDuration), and avatar.js uses currentSign.duration to set hold timing. Consider validating durationMs is a finite number > 0 (and optionally sorting/validating frames by ascending t) so malformed sign data fails fast.
| ' var override = SIGN_OVERRIDES[name];', | ||
| ' SIGN_LIBRARY[name].duration = override.duration;', | ||
| ' SIGN_LIBRARY[name].frames = override.frames;', | ||
| ' SIGN_LIBRARY[name].nmm = override.nmm;', | ||
| ' // Pre-bake quaternions for the keyframe engine', | ||
| ' if (typeof prebakeFrameQuats === "function") {', | ||
| ' prebakeFrameQuats(SIGN_LIBRARY[name].frames);', | ||
| ' } else if (window.AMANDLA_SIGNS && window.AMANDLA_SIGNS.prebakeFrameQuats) {', | ||
| ' window.AMANDLA_SIGNS.prebakeFrameQuats(SIGN_LIBRARY[name].frames);', | ||
| ' }', |
There was a problem hiding this comment.
applySignOverrides() patches duration/frames/nmm and prebakes frame quats, but it does not update the sign’s top-level R/L and _Rq/_Lq.{start,end} to match the first/last keyframes. TransitionEngine._interpolate() relies on toSign._Rq.start (and fromSign._Rq.end) for blending into the sign, so overrides can leave transitions using stale poses unrelated to the real-data frames. Consider updating SIGN_LIBRARY[name].R/L and _Rq/_Lq start/end from the first/last frame after prebake (or replacing the whole sign object with a signWithFrames()-created one).
| def write_generated_library(source_js_path, overrides_block, output_path): | ||
| """ | ||
| Write the generated library: re-export everything from source_js, | ||
| then append the overrides block. | ||
|
|
||
| For browser usage the generated file simply includes the source via a | ||
| <script> tag comment at the top (the HTML must load source_js first, | ||
| then this file). For Node.js / module usage a require() is emitted. | ||
| """ |
There was a problem hiding this comment.
write_generated_library() docstring claims the generated file “re-export[s] everything from source_js” / emits a Node require(), but the implementation only writes a header + SIGN_OVERRIDES block (no require/import, no bundling of the source). Either adjust the docstring/header to match the actual behavior (HTML must load signs_library.js first; Node must require it separately), or implement the described require/re-export behavior.
| def print_coverage_report(all_sign_names, real_data, source_js_path=None): | ||
| """Print a table showing which signs have real data vs remain synthetic.""" | ||
| if source_js_path: | ||
| lib_names = read_sign_names_from_js(source_js_path) | ||
| else: | ||
| lib_names = set() | ||
|
|
||
| total_lib = len(lib_names) if lib_names else '?' | ||
| total_real = len(real_data) | ||
| total_synth = (len(lib_names) - total_real) if lib_names else '?' |
There was a problem hiding this comment.
print_coverage_report() takes all_sign_names but never uses it, which makes the call site (passing set(sign_data.keys())) misleading and prevents reporting when the source JS isn’t available. Either remove the unused parameter or use it as a fallback library name set when source_js_path is missing/invalid (so totals aren’t just '?').
| return { | ||
| 'frames': keyframes, | ||
| 'duration': duration_ms, | ||
| 'source': all_frames[0]['joints'].get('source_file', 'smplx'), | ||
| 'n_raw_frames': n_total, |
There was a problem hiding this comment.
build_keyframe_entry() sets 'source' from all_frames[0]['joints'].get('source_file', 'smplx'), but extract_all_frames()/extract_joints_from_frame() no longer populate 'source_file'. As a result, 'source' will always fall back to 'smplx' and you lose the originating .pkl filename in the output metadata. Consider propagating the pkl basename into each frame (or pass it into build_keyframe_entry) so the JSON reports the correct source.
| # Coverage retry: if >30% of tokens will be fingerspelled, try again | ||
| # with a conservative prompt that constrains output to known signs. | ||
| coverage = self._compute_coverage(response) | ||
| if coverage < 0.70 and settings.gemini_api_key: | ||
| logger.warning( | ||
| "Low sign coverage (%.0f%%) for '%s' — retrying with conservative prompt", | ||
| coverage * 100, | ||
| english_text[:40], | ||
| ) | ||
| try: | ||
| retry_response = await self._translate_with_llm_conservative( | ||
| english_text, request | ||
| ) | ||
| retry_response = self._enrich_with_library(retry_response) | ||
| retry_coverage = self._compute_coverage(retry_response) | ||
| if retry_coverage >= coverage: | ||
| logger.info( | ||
| "Conservative retry improved coverage: %.0f%% → %.0f%%", | ||
| coverage * 100, retry_coverage * 100, | ||
| ) | ||
| response = retry_response | ||
| else: | ||
| logger.info("Conservative retry did not improve coverage — keeping original") | ||
| except Exception as retry_exc: | ||
| logger.warning("Conservative retry failed: %s", retry_exc) | ||
|
|
There was a problem hiding this comment.
The new coverage retry / uncertain-token fingerspelling logic is not covered by tests. Since this repo already has pytest coverage for sasl_transformer, consider adding unit tests for _enrich_with_library() and _compute_coverage() (e.g., uncertain=True forces SignType.FINGERSPELL + inclusion in fingerspelled_words, and sign_coverage matches expected ratios).
Five jobs:
- python-lint: ruff on backend/, sasl_transformer/, scripts/ and convert_signs.py
- python-tests: pytest tests/ (rule-based tests, no API key required)
- python-imports: smoke-import of all new modules (sasl_transformer, convert_signs)
- js-syntax: node --check on signs_library.js, avatar.js, src/main.js
- signs-library-check: Node script that verifies all required exports exist,
sign count ≥ 200, TransitionEngine methods, and findFrame() correctness
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Ignore E402/F601/W292 — pre-existing issues in backend files not
introduced by this PR
- Write check_lib.js to GITHUB_WORKSPACE instead of /tmp so
require('./signs_library.js') resolves correctly
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- convert_signs.py: remove unused 'os' import, fix bare f-strings (F541) - merge_sign_data.py: fix bare f-strings (F541) - record_signs.py: remove unused 'os'/'time' imports (F401), remove assigned-but-never-used body_front variable (F841) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
signs_library.jsandavatar.jsnow support real motion-capture keyframe sequences viasignWithFrames(). TheTransitionEngineplays through multi-frame curves when data is available, falling back to the existing SLERP path for all 284 hand-crafted signs (fully backward-compatible).convert_signs.pyupgraded to extract full frame sequences using angular-displacement keyframe selection (30fps → 8–12 keyframes). Two new scripts:scripts/record_signs.py(MediaPipe webcam recorder) andscripts/merge_sign_data.py(merge tool that generatessigns_library_generated.jswithout touching the source library).sign_coverageandfingerspelled_wordsare now surfaced in API responses and WebSocket broadcasts.Test plan
signWithFrames()with 3 keyframes and verify the avatar traces through the intermediate posepython convert_signs.py --inspect <pkl>against a SignAvatars .pkl and confirm output hasframesarray with 8–12 entriespython scripts/record_signs.py, record a sign, confirm output JSON matches the converter formatpython scripts/merge_sign_data.py --data poses.json --output signs_library_generated.js --reportand verify the coverage tablesign_coverage≥ 0.7 in the response (no retry logged)fingerspelled_wordsis populated🤖 Generated with Claude Code