Skip to content

Add validation visualisation functions#71

Merged
fcorowe merged 7 commits into
mainfrom
codex/validation-vignette-aim
Jun 19, 2026
Merged

Add validation visualisation functions#71
fcorowe merged 7 commits into
mainfrom
codex/validation-vignette-aim

Conversation

@fcorowe

@fcorowe fcorowe commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Purpose

This PR adds package-level visualisation functions for displaying debiasR
validation metrics. The goal is to make validation results easier to inspect,
compare, and communicate across adjustment methods, raw MPD baselines, and
benchmark flows.

The visualisations are designed for validation outputs returned by the
validate_flow_*() functions. They provide exported plotting functions for
reviewing model fit, residual distributions, distributional allocation metrics,
and residual-structure diagnostics.

Rendered Proposal

I have included a rendered HTML version of the full visual proposal in this PR:

The rendered file is committed at
notes/project-management/VALIDATION_VISUAL_PROTOTYPES.html with its companion
VALIDATION_VISUAL_PROTOTYPES_files/ asset folder so the figures display.

Summary

This PR:

  • Adds canonical validation comparison IDs:
    • adjusted_vs_benchmark
    • raw_vs_benchmark
    • raw_vs_adjusted
  • Sets adjusted_vs_benchmark as the default comparison.
  • Adds exported plot_validation_*() functions for:
    • Level 1 overall fit metrics
    • Level 2/3 residual distributions
    • Level 2/3 pairwise flow scatterplots
    • Level 3 residual severity bands
    • Level 4 distributional divergence comparisons
    • Level 5 overall residual-structure diagnostics
    • Level 5 LISA cluster maps using user-supplied sf boundaries
  • Integrates raw MPD vs benchmark as a baseline within the same plots where
    useful.
  • Adds controls for comparison type, adjustment methods, error measures,
    sorting, and residual-band method.
  • Updates documentation, tests, NEWS, project notes, and the validation visual
    prototype notebook.

Notes For Review

Please check whether you are happy with the suggested plot types for each
validation metric.

I am happy with the current colour palette and font family, and I would like any
additional plots to follow the same visual theme.

I would like to keep the visualisation functions and package dependencies to a
minimum. The goal is to provide useful default validation displays without
making the plotting API or dependency surface too large.

Please consider whether any additional visualisation functions are needed. If
yes, please propose functions that follow the same theme, colour palette, and
font family.

In particular, feedback would be useful on whether there are better
visualisations for:

  • Level 4 distributional allocation metrics
  • Level 5 overall residual-structure metrics

This does not include the LISA maps. I would like to keep the map
visualisations.

If new functions or visualisations are proposed, please start a discussion on
this PR. Please do not push changes directly to main; we should agree on the
direction first.

Validation

Ran:

  • git diff --check
  • git diff --cached --check
  • focused plotting tests: tests/testthat/test-plot-validation.R
  • rendered notes/project-management/VALIDATION_VISUAL_PROTOTYPES.qmd to
    notes/project-management/VALIDATION_VISUAL_PROTOTYPES.html, including the
    LISA map with user-supplied LAD and region boundaries

@fcorowe fcorowe force-pushed the codex/validation-vignette-aim branch from 024ccc4 to 0408fcc Compare June 15, 2026 13:35
@fcorowe fcorowe added enhancement New feature or request validation labels Jun 15, 2026
@carmen-cabrera carmen-cabrera marked this pull request as ready for review June 18, 2026 15:18
@carmen-cabrera carmen-cabrera self-requested a review as a code owner June 18, 2026 15:18
Co-authored-by: Francisco Rowe <fcorowe@gmail.com>
@carmen-cabrera

carmen-cabrera commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

PR #71 Review Notes

Review approach:

  • I made a number of minor visual edits directly in the prototype because
    they are aesthetic and easy to assess visually.
  • Larger points about interpretation, labelling and the clarity of comparison
    arguments are kept as review comments rather than direct edits, so Francisco
    can decide whether and how to address them.

General note on aesthetic edits

Some minor aesthetic edits have been made directly in the validation visual
prototype Quarto notebook to improve the appearance and readability of the
figures. These edits are currently applied as additional ggplot2 layers after
calling the package plotting functions, rather than by changing the plotting
functions themselves.

If these visual changes are accepted, it may be worth transferring the relevant
ones into the definitions of the validation visualisation functions so that the
improved styling becomes part of the package defaults rather than only the
prototype notebook.

Cross-cutting point: make comparison structure visible

Review comment:

  • Across several visualisations, the notebook passes
    benchmark_comparisons = c("adjusted_vs_benchmark", "raw_vs_benchmark").
    Conceptually, this means the plots are comparing the benchmark against two
    different quantities: adjusted flows and raw MPD flows.
  • In the current visuals, this comparison structure is often encoded indirectly.
    For example, adjusted-versus-benchmark values appear under adjustment method
    rows, while raw-versus-benchmark values are folded into a separate
    Unadjusted raw MPD baseline row. This is technically coherent, but it makes
    the comparison dimension hard to see.
  • The issue recurs across the proposal: readers need to understand what is
    being compared with what, but this is sometimes only recoverable from the
    code arguments rather than from the plot itself.

Suggested improvement:

  • Make the comparison dimension explicit in plot labels, facet titles,
    legends, subtitles or accompanying text.
  • Where raw MPD is shown as a baseline row, explain clearly that this row
    corresponds to raw_vs_benchmark, while the adjusted-method rows correspond
    to adjusted_vs_benchmark.
  • Avoid using generic labels such as Benchmark comparison unless the plot
    also explains which benchmark comparison is being displayed.

Level 1: Overall fit metrics

Requested change:

  • Avoid using white for the lowest relative-error category in the Level 1
    metric matrix.

Reason:

  • The 0-10 relative-error band currently appears white, which can look like
    missing data or an empty cell rather than the lowest-error category.

Suggested implementation:

  • Keep the existing yellow-to-blue visual direction.
  • Start the scale with a pale yellow instead of white.
  • Reserve white for page background or genuinely missing values.

Follow-up requested change:

  • Split the three Level 1 metric labels over two lines so they do not look
    cramped at the bottom of the metric matrix.

Suggested implementation:

  • Use Mean absolute\nerror.
  • Use Root mean squared\nerror.
  • Use Mean absolute percentage\nerror.

Follow-up requested change:

  • Move the Level 1 relative-error legend from the bottom of the plot to the
    right-hand side.

Reason:

  • A vertical legend should make the ordered relative-error categories easier to
    read as a scale and reduce horizontal crowding under the metric matrix.

Follow-up requested change:

  • Split the Level 1 legend title into two lines: Relative error and
    score (%).

Review comment:

  • The Level 1 call passes comparisons = benchmark_comparisons, where
    benchmark_comparisons is defined earlier as adjusted_vs_benchmark and
    raw_vs_benchmark.
  • It is not clear from the resulting plot how those two comparisons are
    represented. The plot appears to show methods and error metrics, but the
    comparison dimension is not obvious to the reader.
  • The plot or accompanying text should make clearer whether the raw MPD versus
    benchmark comparison is shown as a baseline row, whether the adjusted versus
    benchmark comparison is shown for each method, and how these relate to the
    benchmark_comparisons argument.

Level 2 and Level 3: Residual distributions

Review comment:

  • The explanatory text is difficult to follow because it uses Y - X, x and
    y to describe the comparison convention. In the violin plot, however, the
    visible x-axis shows methods and the visible y-axis shows residual values.
    This makes it easy to confuse the internal comparison convention with the
    visual axes.
  • More generally, when residuals are plotted against a benchmark, it would be
    clearer to use the convention adjusted - benchmark or raw MPD - benchmark, rather than placing the benchmark first in the difference. This
    makes the benchmark the reference value and aligns better with the usual
    interpretation of residuals as estimate minus observed or reference value.
  • The plot also uses the title Benchmark comparison. This is confusing
    because benchmark_comparisons is defined earlier in the notebook as a
    vector containing two comparisons: adjusted_vs_benchmark and
    raw_vs_benchmark.
  • The resulting plot appears to show two violins per adjustment method, but it
    is not immediately clear which violin corresponds to which comparison or how
    those two comparisons are represented in the figure.

Suggested improvement:

  • Make the explanatory text describe the displayed comparisons directly,
    avoiding Y - X unless it is introduced after the plain-language
    explanation.
  • Make the plot labelling clearer so readers can distinguish adjusted versus
    benchmark residuals from raw MPD versus benchmark residuals.

Level 2 and Level 3: Residual scatterplot

Review comment:

  • The scatterplot is clearer than the violin plots, but the comparison
    dimension is still difficult to read. The x-axis can represent either
    adjusted flows or raw MPD flows, while the y-axis represents the benchmark
    flow. However, the plot does not make it obvious which points or panels refer
    to adjusted flows and which refer to raw flows.
  • This becomes especially confusing in the final scatterplot panel, where the
    panel is titled Unadjusted raw MPD but the shared x-axis is labelled
    Adjusted or Unadjusted MPD flow. The reader has to infer that the x-axis
    label is generic across panels while the facet title gives the specific
    series for that panel.
  • This is part of a broader issue across the validation visualisations: several
    plots compare the benchmark against two different quantities, adjusted flows
    and raw MPD flows, but the visual labelling does not always show clearly
    where each comparison appears.

Suggested improvement:

  • Label the comparison directly in the plot, for example with clearer facet
    labels such as Adjusted flow vs benchmark and Raw MPD flow vs benchmark.
  • Consider making the x-axis title or panel subtitles explicit about whether
    the plotted estimate is adjusted or raw MPD.
  • If a shared x-axis label is retained, avoid mixing a generic label with facet
    titles that separately identify adjusted and unadjusted series unless the
    text explains that convention explicitly.
  • Avoid relying only on the benchmark_comparisons object or the plot title to
    communicate this distinction, because readers seeing the figure alone may not
    know which comparison each set of points represents.

Level 3: Residual outlier bands

Review comment:

  • The residual outlier band plots need more explanatory text for a general
    reader. The current notebook explains that the plots show residual severity
    bands, but it does not make sufficiently clear what is being counted.
  • A reader needs to know that each stacked bar shows the share of OD pairs whose
    absolute residual falls into each severity band. These are residual-magnitude
    summaries across OD pairs, not signed over/underestimation plots, not maps and
    not total-error metrics.
  • The distinction between the standard-deviation version and the quantile
    version should also be explained more plainly. The standard-deviation plot
    groups residuals by distance from zero in SD units, while the quantile plot
    groups residuals using shared cut points from the absolute-residual
    distribution.

Suggested improvement:

  • Add a short plain-language paragraph before these plots explaining what one
    bar represents, what each coloured segment represents and how to interpret a
    method with more mass in the higher-severity bands.

Level 4: Distributional allocation summary

Review comment:

  • The use of Jensen-Shannon divergence is useful, but the notebook should give
    readers more intuition about the scale of the metric. It is clear that 0
    means perfect alignment with the benchmark, but it is not clear how large the
    metric can become or how to interpret a value such as 0.158.
  • In the current implementation, JSD appears to be computed using natural
    logarithms, so the theoretical range is 0 to log(2), approximately
    0.693. It would help to clarify this range near the first JSD
    visualisation.
  • It may also be worth considering whether a normalised version scaled from 0
    to 1 would be more intuitive for users, especially in teaching material and
    package documentation.

Review comment:

  • The one-column heatmap for the distributional allocation summary is
    technically readable, but visually it feels underpowered because there is only
    one active comparison dimension. Heatmaps are most useful when both axes carry
    meaningful structure; here, the method dimension is doing most of the work.
  • The following Level 4 bar plot is more intuitive because bar length directly
    represents the JSD value. It may be clearer to keep only one of the two Level
    4 summary plots, with the bar plot preferred over the one-column heatmap,
    unless there is a specific reason to retain both.
  • The bar plot itself could still be improved. Because the adjusted methods have
    similar JSD values, the bars look nearly the same length, which makes it hard
    to compare methods visually. This may require better scale treatment,
    annotation, sorting, a reference line for the raw MPD baseline or another
    design choice that makes small but meaningful differences easier to read.

Level 5: Residual-structure diagnostics

Review comment:

  • The Level 5 residual-structure diagnostics should be connected more clearly
    to the residual diagnostics used earlier in the assessing-bias workflow.
  • In the assessing-bias section, validate_bias_residual_structure() analyses
    residuals from population-count coverage or representation bias, for example
    coverage-score residuals, user-count residuals, standardized user-count
    residuals or population-only model residuals.
  • In the validation section, validate_flow_residual_structure() analyses
    residuals from OD-flow comparisons, such as adjusted versus benchmark flows.
    The diagnostic logic is similar, but the object being diagnosed is different:
    population-count representation residuals versus OD-flow validation residuals.

Suggested improvement:

  • Add a short explanation in the vignette or prototype text that makes this
    distinction explicit, so readers understand that the package applies a shared
    residual-structure logic in two places, but to different residual objects.

Review comment:

  • The third Level 5 panel, residual-covariate correlation, currently has no
    plotted values in the prototype. This makes the panel look unfinished and
    leaves unclear how users should interpret or use this diagnostic.
  • This should be addressed before the visualisation is presented as a complete
    validation diagnostic by including values for a residual versus covariate
    correlation example.

@fcorowe fcorowe merged commit a990231 into main Jun 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request validation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants