Issue Title
Define a GA4GH LinkML transformation model for cross-schema mappings
Issue Type
Schema Alignment
Problem Statement
GA4GH currently has active work on schema alignment, identifier governance, data model best practices, and interoperability across products, but there does not yet appear to be a harmonized way to represent transformation rules between non-GA4GH schemas and GA4GH standards as machine-readable, governed artifacts.
This gap becomes operationally significant when an external schema must be mapped into GA4GH VRS. In a PXF protobuf to VRS 1.3 workflow, implementers must make explicit decisions about:
- which PXF elements correspond to which VRS elements
- whether a mapping is a direct replacement, transformation, normalization, computation, copy, or no-mapping case
- how coordinate systems are converted
- what validation rules apply
- how cardinality mismatches are handled
- how unmapped concepts are represented
- what acceptance and error states should be recorded
- how mapping behavior is versioned as either schema evolves
Without a shared artifact model, those decisions are typically embedded in custom code, local tables, or prose documents. That leads to several technical risks:
- semantic drift across adopters mapping the same source schema into VRS
- inconsistent identifier handling and provenance
- inconsistent normalization logic, especially for coordinates and intervals
- lack of a machine-readable contract for validation and testing
- inability to compare, diff, review, or reuse transformation rules across teams
- no common way to publish transformation metadata for downstream tooling or governance
The desired state is for GA4GH to define a reusable, machine-readable transformation model that captures source-to-target mapping semantics in a standard form. Success would mean that transformations from external schemas into VRS can be treated as explicit interoperability artifacts rather than opaque implementation details.
Scope Validation
✅ Harmonization Impact:
This issue directly supports harmonization by proposing a common way to represent the semantic mapping layer between external schemas and GA4GH standards. It would make transformation behavior inspectable and comparable across adopters.
✅ Barrier Reduction:
It reduces barriers caused by duplicated mapping design, unclear transformation semantics, and poor discoverability of prior work. It also supports repeatable validation and conformance testing.
✅ Alignment Challenges:
This issue addresses concrete alignment challenges involving:
- identifiers and canonical references to schema elements
- field-level correspondence between models
- coordinate normalization
- type and cardinality mismatch handling
- versioning of schema-to-schema mappings
- publication and governance of transformation artifacts
✅ Cross-Work Stream:
Yes. The pilot target is VRS within GKS, but the transformation model is broadly relevant to schema alignment across GA4GH. It intersects with DaMaSC best practices, schema registry work, interoperability touchpoints, and implementation guidance.
Proposed Solution(s)
Proposed Solution(s)
I recommend that TASC evaluate and potentially define a minimal GA4GH transformation model using LinkML, with PXF to VRS as the first pilot use case.
The purpose of using LinkML here is therefore to standardize the structure and semantics of transformation artifacts. LinkML would define what a valid mapping record looks like. A separate compiler or runtime would still execute the transformation logic.
Recommended Model
Minimally define the following classes:
TransformationRecord
SourceElement
TargetElement
MappingAction
CompilerMetadata
A transformation record should include:
Source metadata:
- source element name
- canonical source element URI
- source schema identifier and version
- source type
- source path
- cardinality
Target metadata:
- target standard identifier
- target version
- target class or object type
- canonical target element URI or schema reference
- structural path such as JSON Pointer
Transformation metadata:
- action type such as
REPLACE, TRANSFORM, NORMALIZE, COMPUTE, COPY, CONCAT, or NONE
- human-readable description
- machine-readable expression or rule reference
- validation rules
- normalization guidance
- fallback behavior
Quality and governance metadata:
- acceptance status such as
OK, WARN, FAIL
- error code taxonomy
- provenance
- compiler version
- timestamp
- notes
A deterministic record_key may also be useful as an identifier for indexing, comparison, and governance, but should be defined consistently.
Attached please find three YAML files with three complete LinkML instance examples.
Pilot Use Case
The PXF to VRS proposal can serve as an initial pilot. It already contains representative examples covering:
- direct field replacement
- normalized coordinate transformation
- no-mapping and error cases
- acceptance codes and error taxonomy
- versioned transformation metadata
Representative VRS targets in the pilot include:
- Allele
- SequenceLocation
- Haplotype
- TextVariation
- VariationSet
- CopyNumberChange
- SequenceReference
Allele example.yaml
No-mapping example.yaml
SequenceLocation example.yaml
Estimated Effort Level
Medium (3-6 months, moderate resources)
Success Criteria
Measurable Outcomes:
- TASC agrees on whether LinkML-based transformation artifacts are in scope
- a requirements memo is produced
- a minimum LinkML transformation schema is defined
- at least one pilot PXF to VRS mapping set is published as valid LinkML instances
- the pilot demonstrates direct mapping, normalized transformation, and no-mapping cases
- guidance is produced on publication, versioning, and governance
Key Metrics:
- multiple reviewers can independently understand and evaluate a mapping without reading implementation code
- at least one mapping artifact can be version-diffed and reused by another implementer
- transformation records support validation of representative cases involving identifiers, intervals, and cardinality
- the pilot model is judged reusable beyond a single PXF use case
Timeline:
- short term: scope decision and requirements memo
- medium term: LinkML schema and pilot mappings
- later: governance recommendation and registry integration path
How will this issue aid GA4GH harmonization?
This issue would create a shared, machine-readable representation for the semantic layer between schemas. That matters because interoperability failures often occur not only at the level of object models, but at the level of translation between them.
A LinkML transformation model would help GA4GH harmonization by:
- making mapping decisions explicit
- enabling review of alignment assumptions
- reducing divergent local implementations
- supporting reuse of mapping artifacts
- creating a foundation for testing, validation, and conformance around transformations
- aligning transformation artifacts with broader data-model and schema-governance practices across GA4GH
In that sense, it complements existing TASC work on schema registry, data model best practices, identifier alignment, and interoperability touchpoints rather than duplicating it.
Additional context
Relevant TASC issues include:
- interoperability points across products, which explicitly mentions harmonized common data types and libraries of transformations
- schema registry, which highlights the need for discoverable, versioned, governed artifacts across the ecosystem
- data model best practices, which emphasizes machine-readable and human-readable artifacts for approved models
- namespace policy and identifier harmonization, which show that shared semantics and reusable identifiers are already treated as GA4GH-wide concerns
Work Streams Raising This Issue
Other Groups Raising This Issue
No response
Work Streams That Will Be Impacted
Other Groups That Will Be Impacted
No response
Key Stakeholders to Consult
Organizations/Communities:
- TASC leadership
- GKS and VRS contributors
- DaMaSC contributors
- driver projects with cross-standard interoperability needs
- external communities responsible for adjacent variation schemas
Technical Experts:
- VRS editors and implementers
- contributors to interoperability, schema registry, and identifier-governance work
- schema registry and metadata-infrastructure experts
- implementers with production mapping and normalization experience
Decision Makers:
- TASC co-leads
- PSC stakeholders as needed
- maintainers of any future registry or TASC-managed artifact repository
Products affected
- VRS
- transformation tooling targeting VRS
- future schema registry or metadata discovery infrastructure
- conformance and validation tooling that depends on explicit mapping semantics
Additional Context
The initial pilot proposal already includes examples of direct replacement, coordinate normalization, and no-equivalent error handling. These can be represented as LinkML instances conforming to a shared transformation schema.
LinkML: https://github.com/linkml
Priority Level
Medium (should be addressed within 3-6 months)
Additional Tags
Issue Title
Define a GA4GH LinkML transformation model for cross-schema mappings
Issue Type
Schema Alignment
Problem Statement
GA4GH currently has active work on schema alignment, identifier governance, data model best practices, and interoperability across products, but there does not yet appear to be a harmonized way to represent transformation rules between non-GA4GH schemas and GA4GH standards as machine-readable, governed artifacts.
This gap becomes operationally significant when an external schema must be mapped into GA4GH VRS. In a PXF protobuf to VRS 1.3 workflow, implementers must make explicit decisions about:
Without a shared artifact model, those decisions are typically embedded in custom code, local tables, or prose documents. That leads to several technical risks:
The desired state is for GA4GH to define a reusable, machine-readable transformation model that captures source-to-target mapping semantics in a standard form. Success would mean that transformations from external schemas into VRS can be treated as explicit interoperability artifacts rather than opaque implementation details.
Scope Validation
✅ Harmonization Impact:
This issue directly supports harmonization by proposing a common way to represent the semantic mapping layer between external schemas and GA4GH standards. It would make transformation behavior inspectable and comparable across adopters.
✅ Barrier Reduction:
It reduces barriers caused by duplicated mapping design, unclear transformation semantics, and poor discoverability of prior work. It also supports repeatable validation and conformance testing.
✅ Alignment Challenges:
This issue addresses concrete alignment challenges involving:
✅ Cross-Work Stream:
Yes. The pilot target is VRS within GKS, but the transformation model is broadly relevant to schema alignment across GA4GH. It intersects with DaMaSC best practices, schema registry work, interoperability touchpoints, and implementation guidance.
Proposed Solution(s)
Proposed Solution(s)
I recommend that TASC evaluate and potentially define a minimal GA4GH transformation model using LinkML, with PXF to VRS as the first pilot use case.
The purpose of using LinkML here is therefore to standardize the structure and semantics of transformation artifacts. LinkML would define what a valid mapping record looks like. A separate compiler or runtime would still execute the transformation logic.
Recommended Model
Minimally define the following classes:
TransformationRecordSourceElementTargetElementMappingActionCompilerMetadataA transformation record should include:
Source metadata:
Target metadata:
Transformation metadata:
REPLACE,TRANSFORM,NORMALIZE,COMPUTE,COPY,CONCAT, orNONEQuality and governance metadata:
OK,WARN,FAILA deterministic
record_keymay also be useful as an identifier for indexing, comparison, and governance, but should be defined consistently.Attached please find three YAML files with three complete LinkML instance examples.
Pilot Use Case
The PXF to VRS proposal can serve as an initial pilot. It already contains representative examples covering:
Representative VRS targets in the pilot include:
Allele example.yaml
No-mapping example.yaml
SequenceLocation example.yaml
Estimated Effort Level
Medium (3-6 months, moderate resources)
Success Criteria
Measurable Outcomes:
Key Metrics:
Timeline:
How will this issue aid GA4GH harmonization?
This issue would create a shared, machine-readable representation for the semantic layer between schemas. That matters because interoperability failures often occur not only at the level of object models, but at the level of translation between them.
A LinkML transformation model would help GA4GH harmonization by:
In that sense, it complements existing TASC work on schema registry, data model best practices, identifier alignment, and interoperability touchpoints rather than duplicating it.
Additional context
Relevant TASC issues include:
Work Streams Raising This Issue
Other Groups Raising This Issue
No response
Work Streams That Will Be Impacted
Other Groups That Will Be Impacted
No response
Key Stakeholders to Consult
Organizations/Communities:
Technical Experts:
Decision Makers:
Products affected
Additional Context
The initial pilot proposal already includes examples of direct replacement, coordinate normalization, and no-equivalent error handling. These can be represented as LinkML instances conforming to a shared transformation schema.
LinkML: https://github.com/linkml
Priority Level
Medium (should be addressed within 3-6 months)
Additional Tags