Skip to content

Inline inference silently ignores mean_denorm/mean_norm aggregator config #1181

Description

@mcgibbon

Summary

Setting mean_denorm.enabled: true or mean_norm.enabled: true on an inline-inference aggregator config has no effect — the metric is silently stripped at build time with no warning to the user. The same silent override exists in both the ACE and coupled trainers, by different mechanisms. Users have no way to know their configuration was ignored.

Reproduction

In any TrainConfig yaml, add to an inline-inference entry:

inference:
  - name: my_run
    aggregator:
      mean_denorm:
        enabled: true
        target: denorm
      mean_norm:
        enabled: true
        target: norm

Run training. No global-mean time-series metrics appear in W&B or in saved diagnostics, and no warning is logged.

Root cause

ACE trainer

  • fme/ace/train/train.py:144-155 calls entry_config.aggregator.build(..., enable_time_series=False).
  • build_inference_evaluator_aggregator (fme/ace/aggregator/inference/main.py:127-128) then does:
    if not enable_time_series:
        metrics = [m for m in metrics if not isinstance(m, MeanMetricConfig)]
  • So any mean_denorm / mean_norm entry with enabled=True is silently dropped.

Coupled trainer

  • fme/coupled/aggregator.py:218 uses the legacy flat-flag InferenceEvaluatorAggregatorConfig with log_global_mean_time_series: bool = True / log_global_mean_norm_time_series: bool = True.
  • InlineInferenceConfig.__post_init__ (fme/coupled/train/train_config.py:116-123) silently overwrites both to False if the user set them to True.

Scope

MeanMetricConfig is the only metric whose build() populates the time_series slot of MetricBuildResult (fme/ace/aggregator/inference/reduced.py:537). All other metrics (time_mean_*, step_means, power_spectrum, annual, etc.) pass through enable_time_series=False unaffected. So the gate is exactly about global-mean-time-series.

Why the gate exists

The mean time series is reported as a W&B Table whose rows correspond to forecast steps. Inline-inference logging is per-epoch and doesn't model per-step table rows; the standalone inference entrypoint (fme/ace/inference/inference.py:343) calls config.aggregator.build(...) without enable_time_series and time series flow correctly. The behavior is intentional but invisible to users.

Codified in tests

fme/ace/aggregator/inference/test_evaluator.py:512 (test_enable_time_series_false) asserts the silent strip is the expected behavior.

Recommended fix

Per AGENTS.md ("Validate in __post_init__, not at runtime"), raise on misconfiguration in InlineInferenceConfig.__post_init__:

  1. fme/ace/train/train_config.py:75-87 — raise ValueError if aggregator.mean_denorm.enabled or aggregator.mean_norm.enabled is true, with a message pointing the user to the standalone inference entrypoint.
  2. fme/coupled/train/train_config.py:116-123 — replace the silent flip with the same ValueError.
  3. Update fme/ace/aggregator/inference/test_evaluator.py:512 to assert the new error path, and add a corresponding test on InlineInferenceConfig.

Optional follow-up cleanup

Once inline inference can never pass a MeanMetricConfig into the build function, the enable_time_series parameter on build_inference_evaluator_aggregator has only one (test) call site and could be removed. Not required for the bug fix.

Alternatives considered

  • Warn + silent override: matches today's coupled behavior, but warnings get lost in training logs and the issue resurfaces.
  • Move mean_* to a separate standalone-only config class: cleaner long-term API separation, but a bigger churn touching every config file that uses InferenceEvaluatorAggregatorConfig. Not worth it for this bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions