Is your feature request related to a problem? Please describe.
When building audit logging or compliance pipelines on top of Presidio, AnonymizerEngine currently drops the confidence score from each RecognizerResult after anonymization. The resulting OperatorResult only carries start, end, entity_type, text, and operator - the score is silently lost.
This makes it impossible to answer post-anonymization questions like:
- "With what confidence was this entity anonymized?"
- "Were any entities anonymized below our confidence threshold?"
- "Which documents need human review based on low-confidence detections?"
without re-running the analyzer on the original text, which may no longer be available or desirable.
Describe the solution you'd like
Add an optional score field to OperatorResult, threaded through from the originating RecognizerResult during anonymization:
The score should:
- Default to
None so existing usage and the deanonymize path are completely unaffected
- Be included in JSON serialization
- Be preserved through conflict resolution (the surviving entity keeps its own score, not that of any dropped entity)
- Be deserializable via
.from_json()
Describe alternatives you've considered
Zipping analyzer_results with anonymized.items by position - but this breaks because:
- Text positions shift in the anonymized output
- Conflict resolution drops some analyzer results, so the list lengths doesn't match
There is no reliable way to recover the score post-anonymization without carrying it through the engine.
Additional context
This is a purely additive change - score: Optional[float] = None requires no breaking changes to existing code. The score is already present on RecognizerResult, it just needs to be passed when constructing OperatorResult.
Is your feature request related to a problem? Please describe.
When building audit logging or compliance pipelines on top of Presidio,
AnonymizerEnginecurrently drops the confidence score from eachRecognizerResultafter anonymization. The resultingOperatorResultonly carriesstart,end,entity_type,text, andoperator- the score is silently lost.This makes it impossible to answer post-anonymization questions like:
without re-running the analyzer on the original text, which may no longer be available or desirable.
Describe the solution you'd like
Add an optional
scorefield toOperatorResult, threaded through from the originatingRecognizerResultduring anonymization:The score should:
Noneso existing usage and the deanonymize path are completely unaffected.from_json()Describe alternatives you've considered
Zipping
analyzer_resultswithanonymized.itemsby position - but this breaks because:There is no reliable way to recover the score post-anonymization without carrying it through the engine.
Additional context
This is a purely additive change -
score: Optional[float] = Nonerequires no breaking changes to existing code. The score is already present onRecognizerResult, it just needs to be passed when constructingOperatorResult.