Summary
Setting mean_denorm.enabled: true or mean_norm.enabled: true on an inline-inference aggregator config has no effect — the metric is silently stripped at build time with no warning to the user. The same silent override exists in both the ACE and coupled trainers, by different mechanisms. Users have no way to know their configuration was ignored.
Reproduction
In any TrainConfig yaml, add to an inline-inference entry:
inference:
- name: my_run
aggregator:
mean_denorm:
enabled: true
target: denorm
mean_norm:
enabled: true
target: norm
Run training. No global-mean time-series metrics appear in W&B or in saved diagnostics, and no warning is logged.
Root cause
ACE trainer
fme/ace/train/train.py:144-155 calls entry_config.aggregator.build(..., enable_time_series=False).
build_inference_evaluator_aggregator (fme/ace/aggregator/inference/main.py:127-128) then does:
if not enable_time_series:
metrics = [m for m in metrics if not isinstance(m, MeanMetricConfig)]
- So any
mean_denorm / mean_norm entry with enabled=True is silently dropped.
Coupled trainer
fme/coupled/aggregator.py:218 uses the legacy flat-flag InferenceEvaluatorAggregatorConfig with log_global_mean_time_series: bool = True / log_global_mean_norm_time_series: bool = True.
InlineInferenceConfig.__post_init__ (fme/coupled/train/train_config.py:116-123) silently overwrites both to False if the user set them to True.
Scope
MeanMetricConfig is the only metric whose build() populates the time_series slot of MetricBuildResult (fme/ace/aggregator/inference/reduced.py:537). All other metrics (time_mean_*, step_means, power_spectrum, annual, etc.) pass through enable_time_series=False unaffected. So the gate is exactly about global-mean-time-series.
Why the gate exists
The mean time series is reported as a W&B Table whose rows correspond to forecast steps. Inline-inference logging is per-epoch and doesn't model per-step table rows; the standalone inference entrypoint (fme/ace/inference/inference.py:343) calls config.aggregator.build(...) without enable_time_series and time series flow correctly. The behavior is intentional but invisible to users.
Codified in tests
fme/ace/aggregator/inference/test_evaluator.py:512 (test_enable_time_series_false) asserts the silent strip is the expected behavior.
Recommended fix
Per AGENTS.md ("Validate in __post_init__, not at runtime"), raise on misconfiguration in InlineInferenceConfig.__post_init__:
fme/ace/train/train_config.py:75-87 — raise ValueError if aggregator.mean_denorm.enabled or aggregator.mean_norm.enabled is true, with a message pointing the user to the standalone inference entrypoint.
fme/coupled/train/train_config.py:116-123 — replace the silent flip with the same ValueError.
- Update
fme/ace/aggregator/inference/test_evaluator.py:512 to assert the new error path, and add a corresponding test on InlineInferenceConfig.
Optional follow-up cleanup
Once inline inference can never pass a MeanMetricConfig into the build function, the enable_time_series parameter on build_inference_evaluator_aggregator has only one (test) call site and could be removed. Not required for the bug fix.
Alternatives considered
- Warn + silent override: matches today's coupled behavior, but warnings get lost in training logs and the issue resurfaces.
- Move
mean_* to a separate standalone-only config class: cleaner long-term API separation, but a bigger churn touching every config file that uses InferenceEvaluatorAggregatorConfig. Not worth it for this bug.
Summary
Setting
mean_denorm.enabled: trueormean_norm.enabled: trueon an inline-inferenceaggregatorconfig has no effect — the metric is silently stripped at build time with no warning to the user. The same silent override exists in both the ACE and coupled trainers, by different mechanisms. Users have no way to know their configuration was ignored.Reproduction
In any
TrainConfigyaml, add to an inline-inference entry:Run training. No global-mean time-series metrics appear in W&B or in saved diagnostics, and no warning is logged.
Root cause
ACE trainer
fme/ace/train/train.py:144-155callsentry_config.aggregator.build(..., enable_time_series=False).build_inference_evaluator_aggregator(fme/ace/aggregator/inference/main.py:127-128) then does:mean_denorm/mean_normentry withenabled=Trueis silently dropped.Coupled trainer
fme/coupled/aggregator.py:218uses the legacy flat-flagInferenceEvaluatorAggregatorConfigwithlog_global_mean_time_series: bool = True/log_global_mean_norm_time_series: bool = True.InlineInferenceConfig.__post_init__(fme/coupled/train/train_config.py:116-123) silently overwrites both toFalseif the user set them toTrue.Scope
MeanMetricConfigis the only metric whosebuild()populates thetime_seriesslot ofMetricBuildResult(fme/ace/aggregator/inference/reduced.py:537). All other metrics (time_mean_*,step_means,power_spectrum,annual, etc.) pass throughenable_time_series=Falseunaffected. So the gate is exactly about global-mean-time-series.Why the gate exists
The mean time series is reported as a W&B
Tablewhose rows correspond to forecast steps. Inline-inference logging is per-epoch and doesn't model per-step table rows; the standalone inference entrypoint (fme/ace/inference/inference.py:343) callsconfig.aggregator.build(...)withoutenable_time_seriesand time series flow correctly. The behavior is intentional but invisible to users.Codified in tests
fme/ace/aggregator/inference/test_evaluator.py:512(test_enable_time_series_false) asserts the silent strip is the expected behavior.Recommended fix
Per AGENTS.md ("Validate in
__post_init__, not at runtime"), raise on misconfiguration inInlineInferenceConfig.__post_init__:fme/ace/train/train_config.py:75-87— raiseValueErrorifaggregator.mean_denorm.enabledoraggregator.mean_norm.enabledis true, with a message pointing the user to the standalone inference entrypoint.fme/coupled/train/train_config.py:116-123— replace the silent flip with the sameValueError.fme/ace/aggregator/inference/test_evaluator.py:512to assert the new error path, and add a corresponding test onInlineInferenceConfig.Optional follow-up cleanup
Once inline inference can never pass a
MeanMetricConfiginto the build function, theenable_time_seriesparameter onbuild_inference_evaluator_aggregatorhas only one (test) call site and could be removed. Not required for the bug fix.Alternatives considered
mean_*to a separate standalone-only config class: cleaner long-term API separation, but a bigger churn touching every config file that usesInferenceEvaluatorAggregatorConfig. Not worth it for this bug.