fix(es): upper-case NIF/NIE before mod-23 checksum so valid lowercase IDs are detected by AUTHENSOR · Pull Request #2076 · data-privacy-stack/presidio

AUTHENSOR · 2026-06-18T07:04:05Z

Change Description

EsNifRecognizer and EsNieRecognizer were dropping genuinely valid,
lowercase Spanish national identifiers, so the PII leaked unredacted
(analyze → anonymize).

Both recognizers compile their pattern with IGNORECASE, so a lowercase ID
such as 12345678z (NIF/DNI) or x1234567l (NIE) is matched as a candidate
span. But validate_result then compares the extracted control letter
against the uppercase mod-23 table "TRWAGMYFPDXBNJZSQVHLCKE" without
normalizing case:

NIF: letter = pattern_text[-1] is 'z', which never equals the
uppercase letters[number % 23] → checksum "fails" → result dropped.
NIE: in addition to the control letter, the leading prefix check
(pattern_text[:1] not in "XYZ") and the "XYZ".index(pattern_text[0])
lookup also assume uppercase, so a lowercase x… is rejected outright.

The uppercase form of the same ID (12345678Z, X1234567L) is detected at
score 1.0, and for the NIE the lowercase form is missed by every default
recognizer, so this is purely a false-negative leak fix.

Fix

Upper-case the sanitized candidate text before the mod-23 lookup, mirroring how
sanitize_value already normalizes the string (it strips dashes/spaces):

es_nif_recognizer.py: ...sanitize_value(...).upper()
es_nie_recognizer.py: ...sanitize_value(...).upper() (this normalizes the
control letter and the leading X/Y/Z prefix in one step).

The change is conservative: the mod-23 checksum still gates every match, so an
invalid-checksum ID is still rejected regardless of case — no new false
positives.

Verification

Targeted tests (run from the repo root):

$ PYTHONPATH=presidio-analyzer python -m pytest \
    presidio-analyzer/tests/test_es_nif_recognizer.py \
    presidio-analyzer/tests/test_es_nie_recognizer.py -q
.........................                                                [100%]
25 passed in 0.04s

Lint on the changed source files:

$ ruff check  .../spain/es_nif_recognizer.py .../spain/es_nie_recognizer.py
All checks passed!
$ ruff format --check .../spain/es_nif_recognizer.py .../spain/es_nie_recognizer.py
2 files already formatted

New parametrized cases assert:

Valid lowercase NIF (55555555k, 12345678z) and NIE (x9613851n,
z8078221m) are now detected at score 1.0.
The uppercase forms (12345678Z, X9613851N) are still detected.
Invalid-checksum IDs (12345678a, x9613851q) stay rejected, and every
previously detected/rejected case is unchanged.

Issue reference

Note on CHANGELOG

CHANGELOG entry omitted from this patch to avoid merge conflicts with sibling PRs that all insert at the same #### Fixed anchor. Happy to add the entry once the merge order is known, or maintainers can squash it in.

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

… IDs are detected

Copilot

✅ Ready to approve

The change is small, targeted, and backed by unit tests covering the reported false-negative scenarios.

Note: this review does not count toward required approvals for merging.

Pull request overview

Fixes false-negative leaks for Spanish NIF/NIE recognizers by normalizing candidate IDs to uppercase before performing the mod-23 checksum validation, ensuring valid lowercase identifiers are detected (analyzer → anonymizer flow).

Changes:

Uppercase sanitized NIF/NIE candidate text in validate_result() before checksum/prefix validation.
Add unit test cases covering valid lowercase NIF/NIE forms and invalid-checksum lowercase rejections.

File summaries

File	Description
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/spain/es_nif_recognizer.py	Uppercases sanitized NIF text before mod-23 control-letter verification.
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/spain/es_nie_recognizer.py	Uppercases sanitized NIE text before prefix handling and checksum verification.
presidio-analyzer/tests/test_es_nif_recognizer.py	Adds test coverage for valid lowercase NIF detection and invalid-checksum rejection.
presidio-analyzer/tests/test_es_nie_recognizer.py	Adds test coverage for valid lowercase NIE detection and invalid-checksum rejection.

Copilot's findings

Files reviewed: 4/4 changed files
Comments generated: 1

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

+        # uppercase still detected
+        ("X9613851N", 1, ((0, 9),),),


fix(es): upper-case NIF/NIE before mod-23 checksum so valid lowercase…

433da49

… IDs are detected

Copilot AI review requested due to automatic review settings June 18, 2026 07:04

github-actions Bot added the external label Jun 18, 2026

Copilot started reviewing on behalf of AUTHENSOR June 18, 2026 07:05 View session

Copilot AI approved these changes Jun 18, 2026

View reviewed changes

Comment thread presidio-analyzer/tests/test_es_nie_recognizer.py

Comment on lines +35 to +36

# uppercase still detected

("X9613851N", 1, ((0, 9),),),

Merge branch 'main' into fix/es-nif-nie-lowercase

5eb9f53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(es): upper-case NIF/NIE before mod-23 checksum so valid lowercase IDs are detected#2076

fix(es): upper-case NIF/NIE before mod-23 checksum so valid lowercase IDs are detected#2076
AUTHENSOR wants to merge 2 commits into
data-privacy-stack:mainfrom
AUTHENSOR:fix/es-nif-nie-lowercase

AUTHENSOR commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AUTHENSOR commented Jun 18, 2026

Change Description

Fix

Verification

Issue reference

Note on CHANGELOG

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

✅ Ready to approve

Copilot's findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants