Skip to content

feat(chord): validate translation constraints on create/update#705

Open
SanjeevLakhwani wants to merge 14 commits into
developfrom
feature/dataset-translation-validation
Open

feat(chord): validate translation constraints on create/update#705
SanjeevLakhwani wants to merge 14 commits into
developfrom
feature/dataset-translation-validation

Conversation

@SanjeevLakhwani

@SanjeevLakhwani SanjeevLakhwani commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Redmine ticket 2830

  • Reject translations where primary_contact or any stakeholders entry has different roles than the canonical (English) dataset
  • Reject translations where any list field (keywords, stakeholders, taxa, links, publications, logos, counts, resources, participant_criteria, domain, funding_sources) grows beyond the canonical size
  • Validation runs in DatasetTranslationSerializer.to_internal_value() after Pydantic validation, covering both create and update paths; looks up the canonical dataset by translation.identifier so it works regardless of how the serializer is instantiated
  • 11 new tests added covering rejection and acceptance cases for both rules on both POST and PUT

Reject translations where primary_contact or stakeholder roles differ
from the canonical dataset, or where any list field gains items.
Clients no longer need to supply identifier or project in the
translation request body; the serializer injects them from the dataset
looked up via the URL identifier or the existing instance on update.
Previously only list fields were checked. Non-list optional fields
(e.g. spatial_coverage, program_name) could be silently nulled out
in translations. Now any field present in canonical must also be
present in the translation; list fields must additionally match length.
…ions

Translations cannot include a discovery configuration (rejected before
Pydantic parsing, since extra="allow" would silently accept it).

Non-translatable fields (version, release_date, last_modified,
study_status, study_context) must exactly match the canonical dataset
value; providing a different value or adding one where canonical has
none is rejected.
Immutable fields (version, release_date, last_modified, study_status,
study_context) are now exempt from Rule 2 (removal check). Rule 3 only
fires when the field is explicitly provided and differs from canonical.
Add discovery and dac_id to _IMMUTABLE_FIELDS so Rule 2 does not
require them to be present in a translation. Rule 3 still rejects them
if explicitly provided with a value that differs from canonical.
Comment thread chord_metadata_service/chord/serializers.py
"discovery", "pcgl_dac_id"})


def _check_translation_constraints(translation: ProjectScopedDatasetModel):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is done in serialization, but I feel like it should be done in the actual model save method instead to prevent invalid translations from ever being created, not just in the API (e.g., in a hypothetical CLI.) this is my general preference for MVC apps, to have validation closer to the data. however, if there's a reason this is not as good, i'm fine with being convinced otherwise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is just taking a JSON field, I don't see how close to the model we can get with that. Given that in the past we discussed about the effort to be done in implementing this and was asked to be low. It is known that a tech debt is being added. My idea for the future for this is to separate main model from translation. Everything gets serialized into a different translation model, which allows to not store duplicates of data that shouldn't be and have validation against canonical.

Comment thread chord_metadata_service/chord/serializers.py
@SanjeevLakhwani SanjeevLakhwani changed the base branch from refactor/unify-datasets-v2-endpoint to develop June 15, 2026 20:06
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.06250% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.51%. Comparing base (de23adf) to head (df31022).

Files with missing lines Patch % Lines
chord_metadata_service/chord/serializers.py 88.52% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #705      +/-   ##
===========================================
- Coverage    95.59%   95.51%   -0.08%     
===========================================
  Files          138      138              
  Lines         5805     5867      +62     
  Branches       552      568      +16     
===========================================
+ Hits          5549     5604      +55     
- Misses         205      210       +5     
- Partials        51       53       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 75e3880.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants