Skip to content

Feature Request: Include confidence score in AnonymizerEngine output (OperatorResult) for audit/traceability #2057

Description

@kshvg

Is your feature request related to a problem? Please describe.

When building audit logging or compliance pipelines on top of Presidio, AnonymizerEngine currently drops the confidence score from each RecognizerResult after anonymization. The resulting OperatorResult only carries start, end, entity_type, text, and operator - the score is silently lost.

This makes it impossible to answer post-anonymization questions like:

  • "With what confidence was this entity anonymized?"
  • "Were any entities anonymized below our confidence threshold?"
  • "Which documents need human review based on low-confidence detections?"

without re-running the analyzer on the original text, which may no longer be available or desirable.

Describe the solution you'd like

Add an optional score field to OperatorResult, threaded through from the originating RecognizerResult during anonymization:

The score should:

  • Default to None so existing usage and the deanonymize path are completely unaffected
  • Be included in JSON serialization
  • Be preserved through conflict resolution (the surviving entity keeps its own score, not that of any dropped entity)
  • Be deserializable via .from_json()

Describe alternatives you've considered
Zipping analyzer_results with anonymized.items by position - but this breaks because:

  1. Text positions shift in the anonymized output
  2. Conflict resolution drops some analyzer results, so the list lengths doesn't match

There is no reliable way to recover the score post-anonymization without carrying it through the engine.

Additional context
This is a purely additive change - score: Optional[float] = None requires no breaking changes to existing code. The score is already present on RecognizerResult, it just needs to be passed when constructing OperatorResult.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions