Skip to content

Rust-style source-located diagnostics for TOML config validation errors #945

Description

@rutayan-nv

Summary

When a test/scenario TOML fails Pydantic validation, the user gets a flattened message with no source location — no file line, no pointer to the offending key. For cross-field errors (e.g. the env_paramscmd_args contract added in #901) the user has to manually grep the file to find which key and which annotation are in conflict. We can do much better with rustc/miette-style diagnostics that quote the exact lines and point carets at the conflict.

Motivation (current UX)

Take a config where cmd_args.osl is a scalar but is annotated [env_params.osl] (rejected at parse time since #901). Today:

ERROR Failed to parse test spec: 'conf/staging/.../dse_ppo.toml'
	Validation error: Value error, env_params['osl'] annotates cmd_args.osl, which is not a candidate list (got int); the annotation only reclassifies a list-valued sweep as env-sampled, while a scalar is already fixed. Make cmd_args.osl a list or remove the annotation

There is no line number. In a 200-line config with several [env_params.*] blocks, the user still has to hunt for both the osl = 512 line and the [env_params.osl] block.

Proposed enhancement

Render a Rust-style, source-located diagnostic that pinpoints both lines in conflict:

error: env_params['osl'] annotates cmd_args.osl, which is not a candidate list (got int)
  --> conf/staging/.../dse_ppo.toml:63
   |
38 | osl = 512
   |       ^^^ cmd_args.osl defined here
...
63 | [env_params.osl]
   | ^^^^^^^^^^^^^^^^^ env_params['osl'] annotation declared here
   |
   = help: the annotation only reclassifies a list-valued sweep as env-sampled, while a scalar is already fixed. Make cmd_args.osl a list or remove the annotation

The value: the user sees the file:line, the two conflicting lines quoted, carets on the exact spans, and the fix in a help: footer — no grepping.

Design notes

  • Positions don't survive parsing. toml.load returns a plain dict, so line/col are lost. The fix is to re-read the source text on the error path (the parser already holds self.current_file in TestParser.load_test_definition) and locate the keys by scanning, then render.
  • Keep validators source-agnostic. Pydantic validators (e.g. TestDefinition.validate_env_params) raise the structured error; the source-located rendering lives in the parse/format layer (toml_utils + test_parser). Construction without a file (DSE overlay, programmatic use) still gets today's plain message.
  • Structured errors over string-matching. Raising PydanticCustomError("env_params_scalar_target", msg, {"key": name}) lets the formatter key off err["type"]/ctx instead of regex-matching the message text (more robust than sniffing the human string).
  • Graceful fallback. When a span can't be located (e.g. inline-table cmd_args = { osl = 512 } forms), fall back to today's plain message — never crash the error path.
  • Generalizable. env_params is the first beneficiary, but the same machinery (map a Pydantic err["loc"] path → TOML key span → caret) applies to any field validation error, plus we already emit located messages for TOML syntax errors (format_toml_decode_error). This unifies both under one renderer.

Scope / effort

Small and additive (~150–200 LoC): a TOML key/section locator + a multi-span caret renderer in toml_utils.py, wired into TestParser.load_test_definition, behind the existing ValidationError path. No behavior change to successful parses.

Severity

Enhancement (DX/UX). Pure quality-of-life; highest leverage for cross-field config contracts like the env_params annotations introduced in #901.

Code: src/cloudai/test_parser.py (load_test_definition), src/cloudai/toml_utils.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions