feat(analyzer): add generic VIN and IMEI recognizers#2070
feat(analyzer): add generic VIN and IMEI recognizers#2070thatomokoena wants to merge 10 commits into
Conversation
Add VinRecognizer and ImeiRecognizer as enabled predefined recognizers with pattern matching, context support, and checksum validation for vehicle and mobile device identifiers. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds two new generic predefined recognizers (VIN and IMEI) to Presidio Analyzer and wires them into the default registry/config, along with tests and documentation updates.
Changes:
- Introduces
VinRecognizerwith a VIN regex plus check-digit validation logic. - Introduces
ImeiRecognizerwith IMEI regex patterns plus Luhn checksum invalidation. - Registers both recognizers in package exports, default YAML config, supported entities docs, and changelog.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/vin_recognizer.py | New VIN recognizer implementation with mod-11 check digit logic |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/imei_recognizer.py | New IMEI recognizer implementation with Luhn validation |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/generic/init.py | Exposes new recognizers from the generic package |
| presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py | Exposes new recognizers at the top-level predefined module |
| presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml | Enables loading the new recognizers by default |
| presidio-analyzer/tests/test_vin_recognizer.py | Adds unit tests for VIN detection and validation behavior |
| presidio-analyzer/tests/test_imei_recognizer.py | Adds unit tests for IMEI detection and invalidation behavior |
| docs/supported_entities.md | Documents newly supported IMEI and VIN entities |
| CHANGELOG.md | Records the addition of VIN and IMEI recognizers |
Reject invalid North American VIN check digits for WMI prefixes 1-5 while preserving base scores for non-NA VINs. Remove bare 15-digit IMEI pattern to avoid collisions with AMEX and other Luhn identifiers. Co-authored-by: Cursor <cursoragent@cursor.com>
Wire sanitize_value through invalidate_result so custom separator replacement_pairs are honored, matching other pattern recognizers. Co-authored-by: Cursor <cursoragent@cursor.com>
Align IMEI Luhn description and VIN validate_result return docs with actual behavior per Copilot review feedback. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Hi @SharonHart @omri374. When you have time, would you be willing to take a look at this PR for adding IMEI and VIN recognizers? Happy to address any feedback. Thanks! |
Reflect two new enabled predefined recognizers in the registry test assertion. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Hi @SharonHart. The CI failure on PR #2070 is fixed. The CI should be green on the next run. When you have a moment, could you take another look? Thanks! |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…rivacy-stack#2070 Document space-delimited IMEI formats in the changelog and add ImeiRecognizer and VinRecognizer to PREDEFINED_RECOGNIZERS so default engine tests cover the new recognizers. Co-authored-by: Cursor <cursoragent@cursor.com>
| |IBAN_CODE|The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.|Pattern match, context and checksum| | ||
| |IMEI|International Mobile Equipment Identity, a 15-digit identifier for mobile devices.|Pattern match, context and checksum| | ||
| |IP_ADDRESS|An Internet Protocol (IP) address (either IPv4 or IPv6).|Pattern match, context and checksum| |
| ("Vehicle VIN is 1HGCM82633A004352", 1, ((15, 32),), ((0.5, "max"),)), | ||
| ("chassis number 1HGCM82633A004352 recorded", 1, ((15, 32),), ((0.5, "max"),)), | ||
| ("vin: 1hgcm82633a004352", 1, ((5, 22),), ((0.5, "max"),)), | ||
| ("The vehicle identification number is 1HGCM82633A004352", 1, ((37, 54),), ((0.5, "max"),)), |
| if fn_score == "max": | ||
| fn_score = max_score | ||
| assert_result_within_score_range( |
Change Description
VinRecognizerforVIN(17-character ISO 3779 vehicle identifiers) with pattern matching, context words, and North American mod-11 check-digit validation. Valid check digits boost confidence toMAX_SCOREfor any region; invalid check digits are rejected for North American WMIs (prefix 1-5); non-NA mismatches keep the base pattern score.ImeiRecognizerforIMEI(15-digit mobile device identifiers) with a formatted pattern (##-######-######-#), context words, and Luhn checksum invalidation. Bare 15-digit matching is omitted to avoid collisions with other Luhn identifiers such as AMEX credit card numbers.default_recognizers.yaml(enabled by default), exports them frompredefined_recognizers, and documents them insupported_entities.mdandCHANGELOG.md.test_vin_recognizer.pyandtest_imei_recognizer.py; all pass locally.ruff checkpasses on new recognizer source files.Issue reference
N/A
Checklist