Skip to content

Fix IN vehicle registration under-scoring zero-padded district codes#2110

Open
jichaowang02-lang wants to merge 1 commit into
data-privacy-stack:mainfrom
jichaowang02-lang:fix/in-vehicle-zero-padded-district
Open

Fix IN vehicle registration under-scoring zero-padded district codes#2110
jichaowang02-lang wants to merge 1 commit into
data-privacy-stack:mainfrom
jichaowang02-lang:fix/in-vehicle-zero-padded-district

Conversation

@jichaowang02-lang

Copy link
Copy Markdown
Contributor

Change Description

InVehicleRegistrationRecognizer validates the district code against state_rto_district_map. The parser produces a two-digit district code when both characters after the state are digits (DL0101), but several states store their district codes single-digit ("1".."9"). So a valid Delhi/Gujarat plate written in the standard zero-padded form fails the district lookup and stays at the base 0.5 score instead of being promoted to 1.0:

r = InVehicleRegistrationRecognizer()
r.analyze("DL01CA1234", ["IN_VEHICLE_REGISTRATION"])[0].score   # 0.5  (DL-01 is a real district)
r.analyze("DL3CJI0001", ["IN_VEHICLE_REGISTRATION"])[0].score   # 1.0  (single-digit form works)
r.analyze("DL13CA1234", ["IN_VEHICLE_REGISTRATION"])[0].score   # 1.0  (two-digit form works)

Only the zero-padded single-digit districts (DL/GJ 01–09) are affected; states that store the padded form (e.g. OD"02") already worked.

Fix

Normalise the leading zero (str(int(dist_code))) as an additional lookup. The check is purely additive — padded-form states keep matching via the original check, and invalid districts (DL99, DL00) are still not promoted.

Checklist

  • I have reviewed the contribution guidelines
  • I have added tests to cover my changes
  • All new and existing tests passed

Tests

$ pytest tests/test_in_vehicle_registration_recognizer.py -q
11 passed

Adds DL01CA1234 and GJ09AB1234 (expected score 1.0) — both score 0.5 before the fix. The existing OD02BA2341 (already padded), DL3CJI0001, and invalid-district KA99ME3456 cases are unchanged.

The district-code validity check compares the parsed code against
state_rto_district_map. Some states store their district codes single-digit
("1".."9"), but the parser produces the zero-padded form ("DL01" -> "01") when
both characters after the state are digits. So a valid Delhi/Gujarat plate
written in the standard zero-padded form (DL01..DL09, GJ01..GJ09) failed the
district lookup and validate_result kept it at the base 0.5 score instead of
promoting it to 1.0 — even though "DL3" (single-digit) and "DL13" (two-digit)
were promoted correctly. (Other states, e.g. OD, store the padded form and
already worked.)

Normalise the leading zero (`str(int(dist_code))`) as an additional lookup, so
a zero-padded district still matches a single-digit entry. The check is purely
additive — states that store the padded form keep matching, and invalid
districts (e.g. DL99, DL00) are still not promoted.

Adds DL01/GJ09 cases (previously scored 0.5).
Copilot AI review requested due to automatic review settings June 27, 2026 16:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants