Addressed review suggestions in issue #1178#1266
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #1178 by tightening validation and handling of hedId values across schema validation and TSV/dataframe IO, and adding/adjusting regression tests around these behaviors.
Changes:
- Added a new schema compliance check to detect duplicate
hedIdvalues across all schema sections and extended the existingSCHEMA_HED_ID_INVALIDmessage to report duplicates. - Fixed multiple
hedId-related edge cases in dataframe/schema synchronization (skip extras correctly, guard out-of-range checks when the valid-range set is empty, and ensure ID assignment mutates the original DataFrame). - Updated and expanded unit tests to cover invalid
hedIdformats, empty/unknown library ranges, dataframe mutation, and mismatch detection.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/schema/test_schema_validator_hed_id.py | Strengthens HedIDValidator.verify_tag_id assertions and adds invalid-format coverage. |
| tests/schema/test_schema_compliance.py | Updates expected compliance check list to include duplicate_hed_ids. |
| tests/schema/test_hed_id_util.py | Adds regression tests for empty-range handling, dataframe mutation, and mismatch detection. |
| hed/schema/schema_validation/hed_id_validator.py | Tweaks “nothing to verify” condition for hedId validation. |
| hed/schema/schema_validation/compliance.py | Adds check_duplicate_hed_ids() and wires it into compliance flow. |
| hed/schema/schema_io/hed_id_util.py | Fixes extras skipping, guards range checks on empty sets, and corrects dataframe mutation during ID assignment. |
| hed/schema/schema_io/df_util.py | Stops silently swallowing OS errors when loading TSVs; only ignores missing files. |
| hed/errors/schema_error_messages.py | Extends SCHEMA_HED_ID_INVALID message formatting to support duplicate-id reporting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
PR Review — fix_extras (#1266)This PR addresses three pre-existing bugs in schema hedId handling and adds a new duplicate-ID compliance check. Overall the changes are correct and well-tested. Critical bug fixed:
|
PR Review: Fix extras and HED ID validation improvementsThis PR fixes three real bugs and adds a new duplicate-hedId compliance check. Overall the changes are correct and well-structured. One important gap and one minor issue are noted below. What this PR does well:
Important: Suggestion: |
|
PR Review summary for #1266 Overall: the PR is well-reasoned and addresses the issues from #1178 with no blocking problems. All five bug fixes are correct and the new check_duplicate_hed_ids compliance check is well-implemented. Bug fixes verified:
Minor issue: In schema_error_messages.py the new duplicate-id error branch has a double space before Each hedId must be unique. The other messages in the same function use a single space after the period. Not a blocker but worth a one-character fix. |
| if duplicate_tag: | ||
| section_info = f" in the '{duplicate_tag_section}' section" if duplicate_tag_section else "" | ||
| return ( | ||
| f"Tag '{tag}' has hedId '{new_id}' which is already used by '{duplicate_tag}'{section_info}. " |
There was a problem hiding this comment.
Suggestion: double space before "Each hedId" — should be a single space to match the style of the other messages in this function.
| f"Tag '{tag}' has hedId '{new_id}' which is already used by '{duplicate_tag}'{section_info}. " | |
| f"Tag '{tag}' has hedId '{new_id}' which is already used by '{duplicate_tag}'{section_info}. " |
@copilot @claude Evaluate whether this PR addresses issues raised in #1178 and whether other improvements should be made.