Skip to content

VIM vimpute update – MI, bootstrap, GAM, robust GAM #103

Description

@matthias-da

Hi all,

quick update on what happened in vimpute on the feature branch (feature/vimpute-mi-bootstrap).

Multiple Imputation (m > 1)

vimpute now supports proper multiple imputation. Set m = 5 and you get back a vimmi object (our own lightweight S3 class, inspired by mice's mids but without the dependency). It stores the original data once and only the imputed values per variable per imputation – memory efficient. There's complete(result, 1) to extract datasets, complete(result, "long") for long
format, with(result, lm(y ~ x)) for fitting models across imputations, and as.mids.vimmi() if you want to convert to mice for pooling. When m = 1 (default), everything works exactly as before – no breaking changes.

Bootstrap & Uncertainty

Two new layers for proper between-imputation variability:

  • boot = TRUE with robustboot parameter: 5 bootstrap strategies (standard, stratified, quantile, residual, psi/Tukey bisquare). Refits the model on bootstrapped training data for model
    uncertainty.
  • uncert parameter: normalerror (add N(0, sigma)), resid (add sampled residual), pmm (predictive mean matching), midastouch (covariate-distance-weighted PMM à la Siddique & Belin 2008).

These two are orthogonal – you can use either or both. Existing pmm = TRUE still works and takes precedence.

method = "gam"

New! Uses mgcv::gam() under the hood, wrapped as custom mlr3 learners. Numeric predictors are automatically wrapped in s() terms, factors stay linear. If you pass a formula via the existing formula parameter, that's used instead (full control over smooth terms). Works for both regression and classification (binary via binomial, multiclass via One-vs-Rest).

method = "robgam"

Also new. Robust GAM with two strategies, controlled via learner_params = list(robgam = list(robust_method = "simple")):

  • "simple" (default): fit GAM, identify outliers by residual quantile (alpha = 0.75), refit on clean subset
  • "irw": iterative reweighting with Tukey bisquare weights until convergence

Same classification support as gam (binary + multiclass). The alpha parameter and robust_method go through learner_params, so no new top-level parameters.

What's next?

Branch is ready for review and to be integrated in origin/main. In near future I will soft-deprecate imputeRobustChain over time, since vimpute with method = "robust" + sequential = TRUE + boot + uncert now covers the same ground (and more). imputeRobust stays as-is for backward compatibility.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions