Skip to content

feat: detect domain typosquatting against known providers #6

Description

@mdesoto

Context

Users sometimes sign up with misspelled versions of major email providers: gmial.com, yahooo.com, hotmal.com, outlok.com. These aren't disposable domains — they're not real at all. They indicate either a typo (genuine user who won't receive confirmation emails) or an intentional fake signup.

A Levenshtein distance check (or similar fuzzy match) against a curated list of major providers would catch these.

Proposed behavior

Signal name

typosquatting

Detection

Maintain an internal list of major email providers (Gmail, Yahoo, Outlook, Hotmail, Protonmail, iCloud, AOL, etc.). When a domain is within a configurable edit distance of a known provider, flag it:

const result = await guard.verify('user@gmial.com');
// {
//     isMatch: true,
//     matchedOn: ['typosquatting'],
//     domain: 'gmial.com',
//     suggestedDomain: 'gmail.com'
// }

Configuration

const guard = await BurnerGuard.create({
    detectTyposquatting: true,              // default: false (opt-in)
    typosquattingMaxDistance: 2,             // max Levenshtein distance (default: 2)
    typosquattingProviders: ['gmail.com', 'yahoo.com', ...]  // override provider list
});

Default off since this is more opinionated than blocklist matching. Some legitimate small domains may coincidentally be close to a major provider name.

Utility method

guard.suggestDomain('gmial.com');  // 'gmail.com'
guard.suggestDomain('gmail.com');  // null (exact match, no suggestion)
guard.suggestDomain('mycompany.com');  // null (not close to anything)

Implementation notes

  • Levenshtein distance is O(n*m) but domain strings are short (<64 chars), so performance is fine
  • Only check against the provider list, not the entire blocklist (thousands of entries would be slow)
  • The provider list should be small (~20-30 entries) and curated
  • Consider also checking for common character swaps (transpositions), not just insertions/deletions

Acceptance criteria

  • Curated list of major email providers (~20-30)
  • Levenshtein distance calculation for domain comparison
  • typosquatting signal in matchedOn
  • suggestedDomain field in VerifyResult when typosquatting is detected
  • detectTyposquatting opt-in config with configurable distance and provider list
  • suggestDomain(domain) utility method
  • Tests covering: common misspellings, exact matches (no false positive), short domains, configurable distance
  • README section explaining typosquatting detection

Priority: 🟢 Nice-to-have — cool feature, less commonly needed than core detection

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions