Skip to content

feat(rewrite): revisit HIGH sensitivity classification and any_high_leaked trigger for needs_human_review #107

@lipikaramaswamy

Description

@lipikaramaswamy

Background

needs_human_review is triggered by two conditions: leakage_mass > flag_leakage_mass_above (default 2.0) or any_high_leaked=True. Investigation of TAB runs showed all flagged rows were triggered by any_high_leaked, not leakage mass.

The entities flagged as HIGH sensitivity were:

These are quasi-identifiers at most. Flagging them as HIGH causes needs_human_review=True even when leakage mass is low and utility is good (e.g. utility=0.97, leakage=0.96).

Questions to resolve

  • What is the right criteria for HIGH vs. MEDIUM sensitivity? Should quasi-identifrs ever be HIGH?
  • Should any_high_leaked alone be sufficient to trigger needs_human_review, or should it require a minimum leakage mass too?
  • Should sensitivity level be configurable per domain (e.g. LEGAL may have different thresholds than MEDICAL)?

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions