Feat(dSep): add ncores for parallel d-separation evaluation#329
Open
Lgz-tud wants to merge 4 commits into
Open
Feat(dSep): add ncores for parallel d-separation evaluation#329Lgz-tud wants to merge 4 commits into
Lgz-tud wants to merge 4 commits into
Conversation
testBasisSetElements was recomputing formulaList (and the response-to-model lookup) once per basis-set element, even though modelList does not change across iterations of the loop. Compute both once in dSep() and pass into the inner function. No behavior change: outputs verified byte-identical (identical()) to pre-refactor on lm, glm, and lmer fixtures across the 4 combinations of (conserve, conditioning).
Adds an `ncores` argument to dSep() that evaluates independence claims in
parallel:
* ncores = 1L (default): byte-identical to the pre-change behavior;
progress bar still works.
* ncores > 1L on Linux/macOS: parallel::mclapply (fork).
* ncores > 1L on Windows: parallel::parLapply on a PSOCK cluster
(with clusterEvalQ(requireNamespace) + clusterExport of the
captured loop state).
* Progress bar automatically disabled when ncores > 1, since
txtProgressBar cannot be shared across worker processes.
* Worker errors are surfaced via stop() instead of silently returning
try-error objects that would later break do.call(rbind, ...).
Adds `parallel` (base R) to DESCRIPTION Imports.
Backward compatibility: existing user code that does not pass `ncores`
continues to use the serial path with the original progress bar; outputs
verified byte-identical on lm, glm, and lmer fixtures.
Parallel path (ncores = 2L) also verified byte-identical to the serial
baseline on the same fixtures (mclapply fork preserves determinism for
these models).
Adds an `ncores` argument to summary.psem(), fisherC(), and AIC_psem(),
so users can opt into parallel d-sep evaluation from the natural entry
points without calling dSep() directly:
summary(SEM, ncores = 4)
The argument is purely a pass-through; there is no new parallel logic in
these three functions. Each just appends ncores to its downstream call:
* summary.psem -> dSep, fisherC, AIC_psem
* fisherC -> dSep (only when dTable is a psem; ignored otherwise)
* AIC_psem -> fisherC (only relevant for AIC.type = "dsep")
Default is `ncores = 1L`, preserving the existing serial behavior
byte-identically. Verified end-to-end: summary(SEM, ncores=1) and
summary(SEM, ncores=2) produce identical() dTable, Cstat, and AIC slots
on lm/glm/lmer fixtures.
devtools::document() rebuild after the dSep refactor + ncores feature. Adds @param ncores to dSep, summary.psem, fisherC, and AIC_psem; adds respMap to the internal testBasisSetElements signature. No code change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an opt-in
ncoresargument todSep(),summary.psem(),fisherC(), andAIC_psem()so tests of directed separation can be evaluated in parallel. Defaultncores = 1Lpreserves the existing serial behavior byte-identically.Motivation
For SEMs with many independence claims (especially involving
lmer/glmmTMBwhere eachupdate()is expensive),summary(SEM)can take many minutes because the d-sep loop is sequential despite each claim being fully independent of the others.Changes
refactor(dSep)— hoistformulaListand the response→model lookup out of the per-iterationtestBasisSetElementscall. Zero behavior change; small constant-time win.feat(dSep)— addncoresargument; dispatch toparallel::mclapply(Linux/macOS) orparallel::parLapplyon a PSOCK cluster (Windows) whenncores > 1L. Progress bar disabled in parallel mode (workers cannot sharetxtProgressBar). Input validated; worker errors surfaced viastop().feat(summary,fisherC,AIC_psem)— propagatencoresthrough the common entry points so users can writesummary(SEM, ncores = 4).docs:— regenerate Rd from roxygen.Backward compatibility
ncores = 1Lis the default. Output verified byte-identical (identical()) to pre-refactormainon lm / glm / lmer SEMs across all four(conserve, conditioning)combinations (12 / 12 fixtures match). The parallel path (ncores = 2L, fork) is also byte-identical to the serial baseline on the same fixtures.Benchmark (28-core Linux, microbenchmark times = 10)
Speedup is sublinear for small basis sets (fork overhead dominates) but grows quickly when claims are individually slow (mixed models) or numerous. Users with 30+ claim mixed-model SEMs should see substantial wins; this PR's bundled fixtures undersample that regime because the package's bundled datasets are small.