Background
The rewrite pipeline's final judge currently uses Option A: privacy, quality, naturalness on a 1–10 scale, adapted from an earlier research repo. PR #86 improved the judge prompt wording and iterated on scoring behavior as an interim fix, but the formal rubric decision was never made. This issue closes that out.
Closes #37.
Options under consideration
| Option |
Dimensions |
Scale |
Notes |
| A (current) |
privacy, quality, naturalness |
1–10 |
Baseline; quality and naturalness overlap in practice |
| B |
privacy, utility, naturalness |
1–10 with improved anchors |
Renames quality → utility for clarity; adds explicit anchor descriptions per score band |
| C |
privacy, utility, faithfulness, fluency |
1–5 |
More granular separation of reservation (faithfulness) vs. readability (fluency); narrower scale may reduce score clustering |
Scope of work
Related
Background
The rewrite pipeline's final judge currently uses Option A: privacy, quality, naturalness on a 1–10 scale, adapted from an earlier research repo. PR #86 improved the judge prompt wording and iterated on scoring behavior as an interim fix, but the formal rubric decision was never made. This issue closes that out.
Closes #37.
Options under consideration
qualityandnaturalnessoverlap in practiceScope of work
FinalJudgeWorkflowimplementationRelated