When a deny rule and a stored path encode the same characters under different Unicode normalization forms, the rule does not match the path and the content is served in cleartext.
Root cause
glob_matches in crates/gitlawb-node/src/visibility.rs matches by raw byte comparison:
fn glob_matches(glob: &str, path: &str) -> bool {
let prefix = glob_prefix(glob);
if prefix == "/" { return true; }
path == prefix || path.starts_with(&format!("{prefix}/"))
}
str::starts_with / == compare UTF-8 bytes. The same grapheme has distinct byte sequences across normalization forms, e.g. é:
- NFC: U+00E9 ->
C3 A9
- NFD: U+0065 U+0301 ->
65 CC 81
So a deny rule stored as /sécret/** in NFC does not match a path stored as /sécret/x.txt in NFD. glob_matches returns false, the rule is dropped from the candidate set in visibility_check, best becomes None, and on a public repo the is_public fallback returns Allow. The denied blob is then served and pinned.
Why this happens in practice
The deny rule is entered by the owner (typically NFC, what most input methods and web forms produce). The path comes from git, which stores the bytes as committed. Files committed on macOS are commonly NFD (HFS+/APFS normalize filenames to NFD). A cross-platform contributor is enough to produce the skew.
Verified
Confirmed end to end against the real visibility_check (throwaway test, not committed): an NFC deny rule /sécret/** plus an NFD path /s + e + U+0301 + cret/x.txt, anonymous/stranger caller, returns Decision::Allow. The same mismatch flows through withheld_blob_oids in git/visibility_pack.rs, so the blob is neither withheld on serve nor on pin.
Scope
Independent of and pre-existing relative to #84 (which fixed C-quoting and non-UTF-8 fail-closed in the ls-tree parse). That work made path bytes faithful; this is the next layer: even a faithful path can disagree with the rule on normalization.
Fix direction
Normalize both sides to a single form (NFC) before comparison, in glob_matches (and glob_prefix-derived comparisons), and normalize rule globs on write so stored rules are canonical. Consider whether to normalize stored/served paths consistently too. Add regression coverage for an NFC-rule vs NFD-path pair, asserting Deny.
When a deny rule and a stored path encode the same characters under different Unicode normalization forms, the rule does not match the path and the content is served in cleartext.
Root cause
glob_matchesincrates/gitlawb-node/src/visibility.rsmatches by raw byte comparison:str::starts_with/==compare UTF-8 bytes. The same grapheme has distinct byte sequences across normalization forms, e.g.é:C3 A965 CC 81So a deny rule stored as
/sécret/**in NFC does not match a path stored as/sécret/x.txtin NFD.glob_matchesreturns false, the rule is dropped from the candidate set invisibility_check,bestbecomesNone, and on a public repo theis_publicfallback returnsAllow. The denied blob is then served and pinned.Why this happens in practice
The deny rule is entered by the owner (typically NFC, what most input methods and web forms produce). The path comes from git, which stores the bytes as committed. Files committed on macOS are commonly NFD (HFS+/APFS normalize filenames to NFD). A cross-platform contributor is enough to produce the skew.
Verified
Confirmed end to end against the real
visibility_check(throwaway test, not committed): an NFC deny rule/sécret/**plus an NFD path/s+e+ U+0301 +cret/x.txt, anonymous/stranger caller, returnsDecision::Allow. The same mismatch flows throughwithheld_blob_oidsingit/visibility_pack.rs, so the blob is neither withheld on serve nor on pin.Scope
Independent of and pre-existing relative to #84 (which fixed C-quoting and non-UTF-8 fail-closed in the
ls-treeparse). That work made path bytes faithful; this is the next layer: even a faithful path can disagree with the rule on normalization.Fix direction
Normalize both sides to a single form (NFC) before comparison, in
glob_matches(andglob_prefix-derived comparisons), and normalize rule globs on write so stored rules are canonical. Consider whether to normalize stored/served paths consistently too. Add regression coverage for an NFC-rule vs NFD-path pair, asserting Deny.