Skip to content

[BUG] Project Database Hash Collision Risk #152

@olddev94

Description

@olddev94

Project

vgrep

Description

The Config::hash_path() function uses only the first 8 bytes (64 bits) of a SHA256 hash to generate unique database filenames for different projects. While 64 bits provides ~18 quintillion possibilities, the birthday paradox means collisions become increasingly likely as the number of projects grows. Two different project paths with a hash collision would share the same database, causing data corruption.

Error Message

No error - silent data corruption when collision occurs.

Debug Logs

System Information

- Bounty Version: 0.1.0
- OS: Ubuntu 24.04 LTS
- Rust: 1.75+

Screenshots

No response

Steps to Reproduce

  1. Find two paths that produce the same 8-byte SHA256 prefix (requires brute force or luck)
  2. Index project A at path X
  3. Index project B at path Y (where hash(X)[..8] == hash(Y)[..8])
  4. Project B's data overwrites Project A's data
  5. Search in Project A returns results from Project B

Expected Behavior

  1. Each project should have a guaranteed unique database file
  2. Hash collisions should be detected or impossible
  3. If collision occurs, warn user or use different naming scheme

Actual Behavior

  1. 64-bit hash provides ~1 in 2^32 collision chance after 2^32 projects
  2. Collisions cause silent database sharing/corruption
  3. No detection mechanism exists

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingvalidValid issuevgrep

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions