Skip to content

feat: detect duplicate recipients in CSV uploads (#3319)#416

Open
smcmurtry wants to merge 1 commit into
mainfrom
fix/3319-warn-duplicate-recipients
Open

feat: detect duplicate recipients in CSV uploads (#3319)#416
smcmurtry wants to merge 1 commit into
mainfrom
fix/3319-warn-duplicate-recipients

Conversation

@smcmurtry

Copy link
Copy Markdown
Contributor

Detect duplicate recipients in CSV uploads — closes cds-snc/notification-planning#3319

Summary

Adds duplicate-recipient detection to RecipientCSV so admin can warn senders
before a bulk send when their CSV contains the same recipient more than once.

What's new

RecipientCSV exposes four new properties (none of which contribute to
has_errors, so they remain non-blocking warnings):

Property Description
has_duplicate_recipients True when at least one recipient appears in two or more rows.
count_of_unique_duplicate_recipients Number of distinct recipients that appear more than once.
count_of_duplicate_recipient_rows Total number of extra duplicate rows (everything beyond the first occurrence).
rows_with_duplicate_recipients Generator yielding Row objects for the duplicate rows (the first occurrence is not yielded).

Detection rules

  • Case-insensitive — Alice@Example.com and ALICE@EXAMPLE.COM are duplicates.
  • Whitespace-insensitive — leading/trailing whitespace is ignored.
  • Phone numbers are normalised to E.164 — 6502532222, +1 650-253-2222 and
    650 253 2222 are recognised as the same recipient.
  • Bad-recipient and missing-recipient rows are skipped (so they don't poison
    the dedupe set).
  • Letters are excluded (recipients can legitimately share an address).
  • Only the second and later copies of a recipient are flagged. The first
    occurrence stays as the canonical row.

Why a non-blocking warning?

Senders sometimes intentionally send the same notification to the same
recipient (e.g. resend to a separate intake team). The acceptance criteria in
the issue call for a warning, not a hard block — has_errors is unaffected.

Tests

A new TestDuplicateRecipients class in tests/test_recipient_csv.py covers:

  • no false positives when all recipients are unique
  • exact duplicates (one occurrence flagged)
  • case- and whitespace-insensitive matching
  • counts of unique duplicates vs. duplicate rows
  • phone numbers in different formats
  • skipping bad/missing recipients
  • letter templates exempt
  • has_errors unaffected by duplicates

Full tests/test_recipient_csv.py test suite passes (132 passed, 7 xfailed).
ruff check, ruff format --check, and mypy notifications_utils/recipients.py
all pass.

Companion PR

The admin-side warning UI ships in
notification-admin #fix/3319-warn-duplicate-recipients.

Closes cds-snc/notification-planning#3319.

Adds duplicate-recipient detection to RecipientCSV so that admin can warn
senders before a bulk send when their CSV contains the same recipient
more than once. Detection is case-insensitive, ignores leading/trailing
whitespace, and (for SMS) treats phone numbers in different formats as
equivalent. Letters are excluded because multiple recipients can
legitimately share an address.

The new properties (has_duplicate_recipients,
count_of_unique_duplicate_recipients, count_of_duplicate_recipient_rows,
rows_with_duplicate_recipients) are non-blocking: they do *not* affect
has_errors. Admin can use them to render a warning banner and a
download-duplicates link without preventing the user from sending.

Refs: cds-snc/notification-planning#3319
Copilot AI review requested due to automatic review settings June 17, 2026 21:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds non-blocking duplicate-recipient detection to RecipientCSV so the admin app can warn senders when a CSV repeats the same recipient (email/SMS), while leaving has_errors unchanged.

Changes:

  • Add recipient normalisation + duplicate detection properties/generators to RecipientCSV (email case/trim insensitive; SMS via E.164 normalisation; letters excluded).
  • Add a new test suite covering duplicate detection scenarios and ensuring duplicates remain warnings.
  • Bump package version.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
notifications_utils/recipients.py Introduces recipient normalisation and exposes duplicate-recipient warning properties on RecipientCSV.
tests/test_recipient_csv.py Adds tests for duplicate detection behavior and non-blocking warning semantics.
pyproject.toml Increments library version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +333 to +357
@property
def _duplicate_recipient_row_indices(self):
"""
Returns a set of row indices for rows whose recipient value has already
appeared in an earlier row. The first occurrence of each recipient is
not flagged. Rows with bad or missing recipients are skipped, and
duplicate detection is disabled for letter templates (where multiple
recipients can legitimately share an address).
"""
if self.template_type == "letter":
return set()
seen = set()
duplicate_indices = set()
for row in self.rows:
if row is None:
continue
if row.has_bad_recipient or row.recipient is None:
continue
normalised = self._normalise_recipient_for_dedupe(row.recipient)
if normalised is None:
continue
if normalised in seen:
duplicate_indices.add(row.index)
else:
seen.add(normalised)
Comment on lines +385 to +400
@property
def count_of_unique_duplicate_recipients(self):
"""Number of distinct recipients that appear more than once."""
if self.template_type == "letter":
return 0
counts: Dict[str, int] = {}
for row in self.rows:
if row is None:
continue
if row.has_bad_recipient or row.recipient is None:
continue
normalised = self._normalise_recipient_for_dedupe(row.recipient)
if normalised is None:
continue
counts[normalised] = counts.get(normalised, 0) + 1
return sum(1 for count in counts.values() if count > 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[admin/utils] Bulk-send CSV uploads do not warn about duplicate recipients

2 participants