Skip to content

Visibility deny rules miss paths under Unicode normalization (NFC/NFD) skew, leaking content #101

Description

@beardthelion

When a deny rule and a stored path encode the same characters under different Unicode normalization forms, the rule does not match the path and the content is served in cleartext.

Root cause

glob_matches in crates/gitlawb-node/src/visibility.rs matches by raw byte comparison:

fn glob_matches(glob: &str, path: &str) -> bool {
    let prefix = glob_prefix(glob);
    if prefix == "/" { return true; }
    path == prefix || path.starts_with(&format!("{prefix}/"))
}

str::starts_with / == compare UTF-8 bytes. The same grapheme has distinct byte sequences across normalization forms, e.g. é:

  • NFC: U+00E9 -> C3 A9
  • NFD: U+0065 U+0301 -> 65 CC 81

So a deny rule stored as /sécret/** in NFC does not match a path stored as /sécret/x.txt in NFD. glob_matches returns false, the rule is dropped from the candidate set in visibility_check, best becomes None, and on a public repo the is_public fallback returns Allow. The denied blob is then served and pinned.

Why this happens in practice

The deny rule is entered by the owner (typically NFC, what most input methods and web forms produce). The path comes from git, which stores the bytes as committed. Files committed on macOS are commonly NFD (HFS+/APFS normalize filenames to NFD). A cross-platform contributor is enough to produce the skew.

Verified

Confirmed end to end against the real visibility_check (throwaway test, not committed): an NFC deny rule /sécret/** plus an NFD path /s + e + U+0301 + cret/x.txt, anonymous/stranger caller, returns Decision::Allow. The same mismatch flows through withheld_blob_oids in git/visibility_pack.rs, so the blob is neither withheld on serve nor on pin.

Scope

Independent of and pre-existing relative to #84 (which fixed C-quoting and non-UTF-8 fail-closed in the ls-tree parse). That work made path bytes faithful; this is the next layer: even a faithful path can disagree with the rule on normalization.

Fix direction

Normalize both sides to a single form (NFC) before comparison, in glob_matches (and glob_prefix-derived comparisons), and normalize rule globs on write so stored rules are canonical. Consider whether to normalize stored/served paths consistently too. Add regression coverage for an NFC-rule vs NFD-path pair, asserting Deny.

Metadata

Metadata

Assignees

No one assigned

    Labels

    crate:nodegitlawb-node — the serving node and REST APIkind:securityVulnerability fix or hardeningsev:mediumDegraded but workaround existssubsystem:visibilityPath-scoped visibility and content withholding

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions