You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before MedGemma touches a single paper, we need a fixed ontology that tells it what to extract and a weighting model that tells downstream consumers how much to trust it. Without the schema, outputs will be inconsistent across papers and impossible to merge. Without evidence weighting, a case report and a 10,000-participant RCT would carry identical authority in the graph — making clinical reasoning over it unreliable.
This issue is on the critical path: every extraction, validation, and query issue depends on the schema contract defined here.
1. Node Types
Each node type must declare: id format, required properties, optional properties, and canonical enum values where applicable.
Node Type
Description
Required Properties
Notes
Paper
The source PubMed article; anchor for all provenance and weighting
Note on REPORTED_IN → renamed to EXTRACTED_FROM to distinguish raw paper content from extraction provenance. Every node and every edge in the graph must carry at least one EXTRACTED_FROM link.
3. HormonalPhase Enumeration (STRAW+10 Mapping)
Label (Graph Enum)
STRAW+10 Stage
FSH Characteristic
Description
REPRODUCTIVE_LATE
-1
Variable
Subtle fertility decline, regular cycles
PERIMENOPAUSE_EARLY
-2
Elevated
Cycle length variability ≥7 days
PERIMENOPAUSE_LATE
-1*
>25 IU/L
Amenorrhea ≥60 days, interval of skipped cycles
MENOPAUSE
0
—
Final menstrual period (retrospective, 12 mo amenorrhea)
POSTMENOPAUSE_EARLY
+1a / +1b / +1c
Stabilizing high
0–6 years post-FMP
POSTMENOPAUSE_LATE
+2
Stable high
>6 years post-FMP
SURGICAL_MENOPAUSE
N/A (map to +1a equivalent)
Variable
Bilateral oophorectomy ± hysterectomy
CHEMOTHERAPY_INDUCED
N/A
Variable
Iatrogenic ovarian failure
PREMATURE_OVARIAN_INSUFFICIENCY
N/A
Elevated
Spontaneous menopause <40 years
Schema must store both label and straw_stage so queries can work at either level of granularity.
4. Paper Evidence Weighting Model
Each Paper node must carry a computed evidence_weight (float, 0.0–1.0) derived from the following components:
log-scaled: min(1.0, log10(n) / log10(10000)) — i.e., n=10→0.25, n=100→0.5, n=1000→0.75, n≥10000→1.0. For meta-analyses, use total pooled N.
Recency
0.10
max(0, 1 - (current_year - pub_year) / 30) — papers >30 yrs old score 0; landmark papers can be manually overridden.
Journal Impact Proxy
0.10
Normalized citation rate: min(1.0, citations_per_year / field_median_cpy). Bootstrap field median from initial corpus.
Replication Signal
0.15
Number of other papers in the graph whose extracted triples corroborate ≥1 triple from this paper. Normalized 0–1. (Computed post-ingestion, default 0 on first pass.)
MedGemma Extraction Confidence
0.10
Mean confidence_score across all EXTRACTED_FROM edges originating from this paper.
(noisy-OR: more independent high-quality sources → higher aggregate confidence, with diminishing returns.)
Query APIs and visualization layers should expose both per-paper weight and aggregate triple confidence.
4d. Override Mechanism
Some landmark papers (e.g., SWAN cohort publications, Kronos Early Estrogen Prevention Study) may deserve manual weight overrides. Schema must support an optional weight_override field on the Paper node with a justification string.
5. Schema File Format
Deliver as schema/menopause_kg_schema.json conforming to this top-level structure:
schema/menopause_kg_schema.json exists, is valid JSON, and passes the JSON Schema meta-validation in tests/test_schema.py
All 9 node types documented with required properties, optional properties, and enum values where applicable
All 9 edge types documented with directionality, required/optional edge properties, and one example triple each
HormonalPhase values enumerated and mapped to STRAW+10 stages (including non-STRAW categories: surgical, chemo-induced, POI)
Evidence weighting model fully specified: component scores, composite formula, propagation rule, and override mechanism
Paper node carries evidence_weight as a required computed field with clear scoring rubric for each component
Schema is versioned (0.1.0, semver) and every extraction record will reference schema_version
A CHANGELOG.md stub is created in schema/ to track future schema evolution
At least 2 team members review and approve the schema before merge (to prevent premature lock-in)
Open Questions (resolve before or during review)
Granularity of StudyPopulation: Should ethnicity be a free-text summary or a controlled vocabulary (e.g., NIH categories)? Leaning toward controlled vocab + other_text escape hatch.
Temporal edges: Some relationships are time-dependent (e.g., symptom severity changes across phases). Do we model this as edge properties or as separate TemporalObservation nodes? Propose deferring to v0.2 unless the team feels strongly.
Negative results: Papers that find no association are as important as positive findings. The PREDICTS / ASSOCIATED_WITH edges support no_effect / unclear values, but should we add an explicit CONTRADICTS edge type for direct conflict tracking? Propose including it in v0.1 to capture this from the start.
Context & Motivation
Before MedGemma touches a single paper, we need a fixed ontology that tells it what to extract and a weighting model that tells downstream consumers how much to trust it. Without the schema, outputs will be inconsistent across papers and impossible to merge. Without evidence weighting, a case report and a 10,000-participant RCT would carry identical authority in the graph — making clinical reasoning over it unreliable.
This issue is on the critical path: every extraction, validation, and query issue depends on the schema contract defined here.
1. Node Types
Each node type must declare:
idformat, required properties, optional properties, and canonical enum values where applicable.pmid,title,pub_year,journal,study_design,sample_size,citation_count,evidence_weight(computed)label,straw_stage,descriptionlabel,mesh_id(optional),category(vasomotor / cognitive / mood / musculoskeletal / sleep / urogenital)label,unit,specimen_type(serum, saliva, CSF, etc.)label,hemisphere(L/R/bilateral/NA),atlas_id(optional)label,domain(memory, executive, attention, processing_speed, verbal_fluency, spatial)label,type(pharmacological / surgical / lifestyle / supplement / psychotherapy),route(oral, transdermal, etc., if applicable)age_range,mean_age,n,ethnicity,inclusion_criteria_summary,menopausal_statussymbol,rsid(optional),gene_id(optional)2. Edge Types
Every edge must declare: source type → target type, directionality, required edge properties, and an example triple.
direction(increases / decreases / fluctuates / unclear),magnitude(if reported)(Late Perimenopause) -[MODULATES {direction: "decreases"}]→ (Estradiol)correlation_direction(+/-/unclear),imaging_modality(fMRI, PET, sMRI, etc.)(Brain Fog) -[ASSOCIATED_WITH {correlation_direction: "-", imaging_modality: "fMRI"}]→ (Prefrontal Cortex)prevalence(if reported),severity_scale(if reported)(Early Postmenopause) -[PRESENTS_WITH {prevalence: "60-80%"}]→ (Hot Flashes)association_direction(+/-),p_value(optional),effect_size(optional)(E2) -[PREDICTS {association_direction: "+", p_value: 0.003}]→ (Verbal Memory)outcome_measure,result_summary(improved / no_effect / worsened)(Transdermal E2 HRT) -[TESTED_IN {result_summary: "improved"}]→ (Peri Women 45-55)effect(alleviates / worsens / no_effect),effect_size(optional)(SSRI) -[AFFECTS {effect: "alleviates"}]→ (Hot Flashes)mechanism(neuroprotective / neuroinflammatory / neurotrophic / unclear)(BDNF) -[INFLUENCES {mechanism: "neurotrophic"}]→ (Hippocampus)interaction_type(efficacy_modifier / risk_modifier / metabolic)(CYP2D6 poor metabolizer) -[INTERACTS_WITH {interaction_type: "efficacy_modifier"}]→ (Tamoxifen)extraction_method(manual / MedGemma_v*),confidence_score[0–1],text_span(source sentence)(triple: E2 PREDICTS Verbal Memory) -[EXTRACTED_FROM {confidence_score: 0.92}]→ (PMID:33456789)3. HormonalPhase Enumeration (STRAW+10 Mapping)
REPRODUCTIVE_LATEPERIMENOPAUSE_EARLYPERIMENOPAUSE_LATEMENOPAUSEPOSTMENOPAUSE_EARLYPOSTMENOPAUSE_LATESURGICAL_MENOPAUSECHEMOTHERAPY_INDUCEDPREMATURE_OVARIAN_INSUFFICIENCYSchema must store both
labelandstraw_stageso queries can work at either level of granularity.4. Paper Evidence Weighting Model
Each
Papernode must carry a computedevidence_weight(float, 0.0–1.0) derived from the following components:4a. Component Scores
meta_analysis/systematic_review: 1.0 ·rct: 0.9 ·prospective_cohort: 0.7 ·cross_sectional: 0.5 ·case_control: 0.4 ·case_report/case_series: 0.2 ·narrative_review/editorial: 0.15 ·animal/in_vitro: 0.1min(1.0, log10(n) / log10(10000))— i.e., n=10→0.25, n=100→0.5, n=1000→0.75, n≥10000→1.0. For meta-analyses, use total pooled N.max(0, 1 - (current_year - pub_year) / 30)— papers >30 yrs old score 0; landmark papers can be manually overridden.min(1.0, citations_per_year / field_median_cpy). Bootstrap field median from initial corpus.confidence_scoreacross allEXTRACTED_FROMedges originating from this paper.4b. Composite Formula
4c. How Weight Propagates
EXTRACTED_FROMedge carries the source paper'sevidence_weight.E2 -[PREDICTS]→ Verbal Memory) is extracted from multiple papers, the triple's aggregate confidence is:4d. Override Mechanism
Some landmark papers (e.g., SWAN cohort publications, Kronos Early Estrogen Prevention Study) may deserve manual weight overrides. Schema must support an optional
weight_overridefield on thePapernode with ajustificationstring.5. Schema File Format
Deliver as
schema/menopause_kg_schema.jsonconforming to this top-level structure:{ "schema_version": "0.1.0", "semver_note": "MAJOR.MINOR.PATCH — bump MINOR for new node/edge types, PATCH for property additions", "node_types": { "Paper": { "required": ["pmid", "title", "pub_year", "study_design", "evidence_weight"], "optional": ["journal", "sample_size", "citation_count", "weight_override", "weight_override_justification"], "enums": { "study_design": ["meta_analysis", "systematic_review", "rct", "prospective_cohort", "cross_sectional", "case_control", "case_report", "narrative_review", "editorial", "animal_in_vitro"] } }, // ... remaining node types per §1 }, "edge_types": { "MODULATES": { "source": "HormonalPhase", "target": "Biomarker", "directed": true, "required_properties": ["direction"], "optional_properties": ["magnitude"], "example": { /* ... */ } }, // ... remaining edge types per §2 }, "evidence_weighting": { "components": { /* per §4a */ }, "formula": "weighted_sum", "aggregation": "noisy_or" }, "hormonal_phase_enum": { /* per §3 */ } }Acceptance Criteria
schema/menopause_kg_schema.jsonexists, is valid JSON, and passes the JSON Schema meta-validation intests/test_schema.pyHormonalPhasevalues enumerated and mapped to STRAW+10 stages (including non-STRAW categories: surgical, chemo-induced, POI)Papernode carriesevidence_weightas a required computed field with clear scoring rubric for each component0.1.0, semver) and every extraction record will referenceschema_versionCHANGELOG.mdstub is created inschema/to track future schema evolutionOpen Questions (resolve before or during review)
StudyPopulation: Should ethnicity be a free-text summary or a controlled vocabulary (e.g., NIH categories)? Leaning toward controlled vocab +other_textescape hatch.TemporalObservationnodes? Propose deferring to v0.2 unless the team feels strongly.PREDICTS/ASSOCIATED_WITHedges supportno_effect/unclearvalues, but should we add an explicitCONTRADICTSedge type for direct conflict tracking? Propose including it in v0.1 to capture this from the start.