feat: detect duplicate recipients in CSV uploads (#3319)#416
Open
smcmurtry wants to merge 1 commit into
Open
Conversation
Adds duplicate-recipient detection to RecipientCSV so that admin can warn senders before a bulk send when their CSV contains the same recipient more than once. Detection is case-insensitive, ignores leading/trailing whitespace, and (for SMS) treats phone numbers in different formats as equivalent. Letters are excluded because multiple recipients can legitimately share an address. The new properties (has_duplicate_recipients, count_of_unique_duplicate_recipients, count_of_duplicate_recipient_rows, rows_with_duplicate_recipients) are non-blocking: they do *not* affect has_errors. Admin can use them to render a warning banner and a download-duplicates link without preventing the user from sending. Refs: cds-snc/notification-planning#3319
Contributor
There was a problem hiding this comment.
Pull request overview
Adds non-blocking duplicate-recipient detection to RecipientCSV so the admin app can warn senders when a CSV repeats the same recipient (email/SMS), while leaving has_errors unchanged.
Changes:
- Add recipient normalisation + duplicate detection properties/generators to
RecipientCSV(email case/trim insensitive; SMS via E.164 normalisation; letters excluded). - Add a new test suite covering duplicate detection scenarios and ensuring duplicates remain warnings.
- Bump package version.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
notifications_utils/recipients.py |
Introduces recipient normalisation and exposes duplicate-recipient warning properties on RecipientCSV. |
tests/test_recipient_csv.py |
Adds tests for duplicate detection behavior and non-blocking warning semantics. |
pyproject.toml |
Increments library version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+333
to
+357
| @property | ||
| def _duplicate_recipient_row_indices(self): | ||
| """ | ||
| Returns a set of row indices for rows whose recipient value has already | ||
| appeared in an earlier row. The first occurrence of each recipient is | ||
| not flagged. Rows with bad or missing recipients are skipped, and | ||
| duplicate detection is disabled for letter templates (where multiple | ||
| recipients can legitimately share an address). | ||
| """ | ||
| if self.template_type == "letter": | ||
| return set() | ||
| seen = set() | ||
| duplicate_indices = set() | ||
| for row in self.rows: | ||
| if row is None: | ||
| continue | ||
| if row.has_bad_recipient or row.recipient is None: | ||
| continue | ||
| normalised = self._normalise_recipient_for_dedupe(row.recipient) | ||
| if normalised is None: | ||
| continue | ||
| if normalised in seen: | ||
| duplicate_indices.add(row.index) | ||
| else: | ||
| seen.add(normalised) |
Comment on lines
+385
to
+400
| @property | ||
| def count_of_unique_duplicate_recipients(self): | ||
| """Number of distinct recipients that appear more than once.""" | ||
| if self.template_type == "letter": | ||
| return 0 | ||
| counts: Dict[str, int] = {} | ||
| for row in self.rows: | ||
| if row is None: | ||
| continue | ||
| if row.has_bad_recipient or row.recipient is None: | ||
| continue | ||
| normalised = self._normalise_recipient_for_dedupe(row.recipient) | ||
| if normalised is None: | ||
| continue | ||
| counts[normalised] = counts.get(normalised, 0) + 1 | ||
| return sum(1 for count in counts.values() if count > 1) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Detect duplicate recipients in CSV uploads — closes cds-snc/notification-planning#3319
Summary
Adds duplicate-recipient detection to
RecipientCSVso admin can warn sendersbefore a bulk send when their CSV contains the same recipient more than once.
What's new
RecipientCSVexposes four new properties (none of which contribute tohas_errors, so they remain non-blocking warnings):has_duplicate_recipientsTruewhen at least one recipient appears in two or more rows.count_of_unique_duplicate_recipientscount_of_duplicate_recipient_rowsrows_with_duplicate_recipientsRowobjects for the duplicate rows (the first occurrence is not yielded).Detection rules
Alice@Example.comandALICE@EXAMPLE.COMare duplicates.6502532222,+1 650-253-2222and650 253 2222are recognised as the same recipient.the dedupe set).
occurrence stays as the canonical row.
Why a non-blocking warning?
Senders sometimes intentionally send the same notification to the same
recipient (e.g. resend to a separate intake team). The acceptance criteria in
the issue call for a warning, not a hard block —
has_errorsis unaffected.Tests
A new
TestDuplicateRecipientsclass intests/test_recipient_csv.pycovers:has_errorsunaffected by duplicatesFull
tests/test_recipient_csv.pytest suite passes (132 passed, 7 xfailed).ruff check,ruff format --check, andmypy notifications_utils/recipients.pyall pass.
Companion PR
The admin-side warning UI ships in
notification-admin #fix/3319-warn-duplicate-recipients.
Closes cds-snc/notification-planning#3319.