Add validation visualisation functions by fcorowe · Pull Request #71 · de-bias/debiasR

fcorowe · 2026-06-15T08:50:53Z

Purpose

This PR adds package-level visualisation functions for displaying debiasR
validation metrics. The goal is to make validation results easier to inspect,
compare, and communicate across adjustment methods, raw MPD baselines, and
benchmark flows.

The visualisations are designed for validation outputs returned by the
validate_flow_*() functions. They provide exported plotting functions for
reviewing model fit, residual distributions, distributional allocation metrics,
and residual-structure diagnostics.

Rendered Proposal

I have included a rendered HTML version of the full visual proposal in this PR:

The rendered file is committed at
notes/project-management/VALIDATION_VISUAL_PROTOTYPES.html with its companion
VALIDATION_VISUAL_PROTOTYPES_files/ asset folder so the figures display.

Summary

This PR:

Adds canonical validation comparison IDs:
- adjusted_vs_benchmark
- raw_vs_benchmark
- raw_vs_adjusted
Sets adjusted_vs_benchmark as the default comparison.
Adds exported plot_validation_*() functions for:
- Level 1 overall fit metrics
- Level 2/3 residual distributions
- Level 2/3 pairwise flow scatterplots
- Level 3 residual severity bands
- Level 4 distributional divergence comparisons
- Level 5 overall residual-structure diagnostics
- Level 5 LISA cluster maps using user-supplied sf boundaries
Integrates raw MPD vs benchmark as a baseline within the same plots where
useful.
Adds controls for comparison type, adjustment methods, error measures,
sorting, and residual-band method.
Updates documentation, tests, NEWS, project notes, and the validation visual
prototype notebook.

Notes For Review

Please check whether you are happy with the suggested plot types for each
validation metric.

I am happy with the current colour palette and font family, and I would like any
additional plots to follow the same visual theme.

I would like to keep the visualisation functions and package dependencies to a
minimum. The goal is to provide useful default validation displays without
making the plotting API or dependency surface too large.

Please consider whether any additional visualisation functions are needed. If
yes, please propose functions that follow the same theme, colour palette, and
font family.

In particular, feedback would be useful on whether there are better
visualisations for:

Level 4 distributional allocation metrics
Level 5 overall residual-structure metrics

This does not include the LISA maps. I would like to keep the map
visualisations.

If new functions or visualisations are proposed, please start a discussion on
this PR. Please do not push changes directly to main; we should agree on the
direction first.

Validation

Ran:

git diff --check
git diff --cached --check
focused plotting tests: tests/testthat/test-plot-validation.R
rendered notes/project-management/VALIDATION_VISUAL_PROTOTYPES.qmd to
notes/project-management/VALIDATION_VISUAL_PROTOTYPES.html, including the
LISA map with user-supplied LAD and region boundaries

Co-authored-by: Francisco Rowe <fcorowe@gmail.com>

carmen-cabrera · 2026-06-19T06:27:17Z

PR #71 Review Notes

Review approach:

I made a number of minor visual edits directly in the prototype because
they are aesthetic and easy to assess visually.
Larger points about interpretation, labelling and the clarity of comparison
arguments are kept as review comments rather than direct edits, so Francisco
can decide whether and how to address them.

General note on aesthetic edits

Some minor aesthetic edits have been made directly in the validation visual
prototype Quarto notebook to improve the appearance and readability of the
figures. These edits are currently applied as additional ggplot2 layers after
calling the package plotting functions, rather than by changing the plotting
functions themselves.

If these visual changes are accepted, it may be worth transferring the relevant
ones into the definitions of the validation visualisation functions so that the
improved styling becomes part of the package defaults rather than only the
prototype notebook.

Cross-cutting point: make comparison structure visible

Review comment:

Across several visualisations, the notebook passes
benchmark_comparisons = c("adjusted_vs_benchmark", "raw_vs_benchmark").
Conceptually, this means the plots are comparing the benchmark against two
different quantities: adjusted flows and raw MPD flows.
In the current visuals, this comparison structure is often encoded indirectly.
For example, adjusted-versus-benchmark values appear under adjustment method
rows, while raw-versus-benchmark values are folded into a separate
Unadjusted raw MPD baseline row. This is technically coherent, but it makes
the comparison dimension hard to see.
The issue recurs across the proposal: readers need to understand what is
being compared with what, but this is sometimes only recoverable from the
code arguments rather than from the plot itself.

Suggested improvement:

Make the comparison dimension explicit in plot labels, facet titles,
legends, subtitles or accompanying text.
Where raw MPD is shown as a baseline row, explain clearly that this row
corresponds to raw_vs_benchmark, while the adjusted-method rows correspond
to adjusted_vs_benchmark.
Avoid using generic labels such as Benchmark comparison unless the plot
also explains which benchmark comparison is being displayed.

Level 1: Overall fit metrics

Requested change:

Avoid using white for the lowest relative-error category in the Level 1
metric matrix.

Reason:

The 0-10 relative-error band currently appears white, which can look like
missing data or an empty cell rather than the lowest-error category.

Suggested implementation:

Keep the existing yellow-to-blue visual direction.
Start the scale with a pale yellow instead of white.
Reserve white for page background or genuinely missing values.

Follow-up requested change:

Split the three Level 1 metric labels over two lines so they do not look
cramped at the bottom of the metric matrix.

Suggested implementation:

Use Mean absolute\nerror.
Use Root mean squared\nerror.
Use Mean absolute percentage\nerror.

Follow-up requested change:

Move the Level 1 relative-error legend from the bottom of the plot to the
right-hand side.

Reason:

A vertical legend should make the ordered relative-error categories easier to
read as a scale and reduce horizontal crowding under the metric matrix.

Follow-up requested change:

Split the Level 1 legend title into two lines: Relative error and
score (%).

Review comment:

The Level 1 call passes comparisons = benchmark_comparisons, where
benchmark_comparisons is defined earlier as adjusted_vs_benchmark and
raw_vs_benchmark.
It is not clear from the resulting plot how those two comparisons are
represented. The plot appears to show methods and error metrics, but the
comparison dimension is not obvious to the reader.
The plot or accompanying text should make clearer whether the raw MPD versus
benchmark comparison is shown as a baseline row, whether the adjusted versus
benchmark comparison is shown for each method, and how these relate to the
benchmark_comparisons argument.

Level 2 and Level 3: Residual distributions

Review comment:

The explanatory text is difficult to follow because it uses Y - X, x and
y to describe the comparison convention. In the violin plot, however, the
visible x-axis shows methods and the visible y-axis shows residual values.
This makes it easy to confuse the internal comparison convention with the
visual axes.
More generally, when residuals are plotted against a benchmark, it would be
clearer to use the convention adjusted - benchmark or raw MPD - benchmark, rather than placing the benchmark first in the difference. This
makes the benchmark the reference value and aligns better with the usual
interpretation of residuals as estimate minus observed or reference value.
The plot also uses the title Benchmark comparison. This is confusing
because benchmark_comparisons is defined earlier in the notebook as a
vector containing two comparisons: adjusted_vs_benchmark and
raw_vs_benchmark.
The resulting plot appears to show two violins per adjustment method, but it
is not immediately clear which violin corresponds to which comparison or how
those two comparisons are represented in the figure.

Suggested improvement:

Make the explanatory text describe the displayed comparisons directly,
avoiding Y - X unless it is introduced after the plain-language
explanation.
Make the plot labelling clearer so readers can distinguish adjusted versus
benchmark residuals from raw MPD versus benchmark residuals.

Level 2 and Level 3: Residual scatterplot

Review comment:

The scatterplot is clearer than the violin plots, but the comparison
dimension is still difficult to read. The x-axis can represent either
adjusted flows or raw MPD flows, while the y-axis represents the benchmark
flow. However, the plot does not make it obvious which points or panels refer
to adjusted flows and which refer to raw flows.
This becomes especially confusing in the final scatterplot panel, where the
panel is titled Unadjusted raw MPD but the shared x-axis is labelled
Adjusted or Unadjusted MPD flow. The reader has to infer that the x-axis
label is generic across panels while the facet title gives the specific
series for that panel.
This is part of a broader issue across the validation visualisations: several
plots compare the benchmark against two different quantities, adjusted flows
and raw MPD flows, but the visual labelling does not always show clearly
where each comparison appears.

Suggested improvement:

Label the comparison directly in the plot, for example with clearer facet
labels such as Adjusted flow vs benchmark and Raw MPD flow vs benchmark.
Consider making the x-axis title or panel subtitles explicit about whether
the plotted estimate is adjusted or raw MPD.
If a shared x-axis label is retained, avoid mixing a generic label with facet
titles that separately identify adjusted and unadjusted series unless the
text explains that convention explicitly.
Avoid relying only on the benchmark_comparisons object or the plot title to
communicate this distinction, because readers seeing the figure alone may not
know which comparison each set of points represents.

Level 3: Residual outlier bands

Review comment:

The residual outlier band plots need more explanatory text for a general
reader. The current notebook explains that the plots show residual severity
bands, but it does not make sufficiently clear what is being counted.
A reader needs to know that each stacked bar shows the share of OD pairs whose
absolute residual falls into each severity band. These are residual-magnitude
summaries across OD pairs, not signed over/underestimation plots, not maps and
not total-error metrics.
The distinction between the standard-deviation version and the quantile
version should also be explained more plainly. The standard-deviation plot
groups residuals by distance from zero in SD units, while the quantile plot
groups residuals using shared cut points from the absolute-residual
distribution.

Suggested improvement:

Add a short plain-language paragraph before these plots explaining what one
bar represents, what each coloured segment represents and how to interpret a
method with more mass in the higher-severity bands.

Level 4: Distributional allocation summary

Review comment:

The use of Jensen-Shannon divergence is useful, but the notebook should give
readers more intuition about the scale of the metric. It is clear that 0
means perfect alignment with the benchmark, but it is not clear how large the
metric can become or how to interpret a value such as 0.158.
In the current implementation, JSD appears to be computed using natural
logarithms, so the theoretical range is 0 to log(2), approximately
0.693. It would help to clarify this range near the first JSD
visualisation.
It may also be worth considering whether a normalised version scaled from 0
to 1 would be more intuitive for users, especially in teaching material and
package documentation.

Review comment:

The one-column heatmap for the distributional allocation summary is
technically readable, but visually it feels underpowered because there is only
one active comparison dimension. Heatmaps are most useful when both axes carry
meaningful structure; here, the method dimension is doing most of the work.
The following Level 4 bar plot is more intuitive because bar length directly
represents the JSD value. It may be clearer to keep only one of the two Level
4 summary plots, with the bar plot preferred over the one-column heatmap,
unless there is a specific reason to retain both.
The bar plot itself could still be improved. Because the adjusted methods have
similar JSD values, the bars look nearly the same length, which makes it hard
to compare methods visually. This may require better scale treatment,
annotation, sorting, a reference line for the raw MPD baseline or another
design choice that makes small but meaningful differences easier to read.

Level 5: Residual-structure diagnostics

Review comment:

The Level 5 residual-structure diagnostics should be connected more clearly
to the residual diagnostics used earlier in the assessing-bias workflow.
In the assessing-bias section, validate_bias_residual_structure() analyses
residuals from population-count coverage or representation bias, for example
coverage-score residuals, user-count residuals, standardized user-count
residuals or population-only model residuals.
In the validation section, validate_flow_residual_structure() analyses
residuals from OD-flow comparisons, such as adjusted versus benchmark flows.
The diagnostic logic is similar, but the object being diagnosed is different:
population-count representation residuals versus OD-flow validation residuals.

Suggested improvement:

Add a short explanation in the vignette or prototype text that makes this
distinction explicit, so readers understand that the package applies a shared
residual-structure logic in two places, but to different residual objects.

Review comment:

The third Level 5 panel, residual-covariate correlation, currently has no
plotted values in the prototype. This makes the panel look unfinished and
leaves unclear how users should interpret or use this diagnostic.
This should be addressed before the visualisation is presented as a complete
validation diagnostic by including values for a residual versus covariate
correlation example.

fcorowe added 4 commits June 15, 2026 14:35

docs: revise validation vignette aim

2cdf774

feat: standardize validation comparisons and visuals

cd459b1

feat: add validation visualisation functions

a95f026

docs: add validation visual prototype preview

0408fcc

fcorowe force-pushed the codex/validation-vignette-aim branch from 024ccc4 to 0408fcc Compare June 15, 2026 13:35

Add validation plots to pkgdown reference

474c975

fcorowe added enhancement New feature or request validation labels Jun 15, 2026

carmen-cabrera marked this pull request as ready for review June 18, 2026 15:18

carmen-cabrera self-requested a review as a code owner June 18, 2026 15:18

docs: refine validation visual prototype

389e2d8

Co-authored-by: Francisco Rowe <fcorowe@gmail.com>

Address validation visual review feedback

5638d77

fcorowe merged commit a990231 into main Jun 19, 2026
3 checks passed

carmen-cabrera mentioned this pull request Jun 19, 2026

Record validation visualisation vignette decision #72

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add validation visualisation functions#71

Add validation visualisation functions#71
fcorowe merged 7 commits into
mainfrom
codex/validation-vignette-aim

fcorowe commented Jun 15, 2026 •

edited

Loading

Uh oh!

carmen-cabrera commented Jun 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fcorowe commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Rendered Proposal

Summary

Notes For Review

Validation

Uh oh!

carmen-cabrera commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #71 Review Notes

General note on aesthetic edits

Cross-cutting point: make comparison structure visible

Level 1: Overall fit metrics

Level 2 and Level 3: Residual distributions

Level 2 and Level 3: Residual scatterplot

Level 3: Residual outlier bands

Level 4: Distributional allocation summary

Level 5: Residual-structure diagnostics

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fcorowe commented Jun 15, 2026 •

edited

Loading

carmen-cabrera commented Jun 19, 2026 •

edited

Loading