Small-Sample Survey Research: Simulation-Validated Decision Framework

Project Overview

This repository contains a reproducible R-based research workflow for evaluating statistical methods in small-sample survey research. The central objective is to develop a simulation-validated decision framework for studies with sample sizes below 30, where conventional large-sample assumptions are often implausible and method choice can materially alter substantive conclusions.

The current implementation focuses on two analytical settings that recur in small-sample survey work:

two-group comparisons
association analysis

A regression block is planned as a later phase of the project and is not part of the current manuscript-focused release.

Research Problem: Misuse of Large-Sample Methods in Small Samples

Small-sample survey studies are common in specialised populations, pilot evaluations, classroom studies, organisational diagnostics, and early-stage intervention research. In these settings, analysts frequently rely on procedures that were derived or justified under asymptotic conditions. When those conditions are not met, nominal error rates, interval coverage, and inferential stability may degrade in ways that are not transparent from a single applied analysis.

The practical problem is not only low power. It is also method selection under uncertainty: different combinations of outcome scale, distributional shape, and noise structure may favour different inferential strategies.

Why This Matters in Survey Research

Survey researchers regularly analyse:

short scales with limited response categories
skewed attitudinal or behavioural outcomes
modest subgroup comparisons
correlation-based evidence in exploratory instrument work

Under these conditions, the choice among parametric, nonparametric, bootstrap, and Bayesian approaches is rarely neutral. A method that performs adequately for approximately normal interval responses may behave differently for skewed Likert-type data with the same nominal sample size. This repository treats that choice as an empirical design problem rather than a matter of convention.

Methodological Contribution

The project contributes a structured simulation workflow that compares candidate methods across a controlled factorial design:

sample sizes: 10, 20, 30
data-generating distributions: normal and skewed
effects: null and moderate
measurement scales: interval and Likert
noise conditions: low and high

Primary performance criteria:

Type I error
power
bias
confidence interval coverage
Monte Carlo standard errors for the estimated performance criteria

Descriptive summaries reported alongside the primary criteria:

mean estimates
mean p-values
mean Bayes factors

The current analytical blocks are:

Block A: Welch t-test, Mann-Whitney U, Bayesian t-test, bootstrap confidence interval
Block B: Pearson correlation, Spearman correlation, Bayesian correlation, bootstrap confidence interval

Design parameterisation follows the current manuscript protocol:

Block A holds the standardized group difference constant at Cohen's d = 0.50 under the moderate-effect condition, with the raw mean shift scaled by the scenario-specific outcome standard deviation.
Block B holds the latent correlation at 0.35 under the moderate-effect condition; observed-scale truth values are re-estimated after measurement noise is added, so attenuation is treated as part of the data-generating process rather than ignored.

Repository Structure

small-sample-survey-framework/
├─ README.md
├─ LICENSE
├─ .gitignore
├─ renv.lock
├─ small-sample-survey-framework.Rproj
├─ data/
├─ output/
│  ├─ tables/
│  ├─ figures/
│  ├─ logs/
│  └─ derived/
├─ scripts/
├─ functions/
├─ quarto/
├─ manuscript/
└─ docs/

How to Run the Project

Open small-sample-survey-framework.Rproj in RStudio or use the project root in a terminal session.
Run Rscript scripts/99_run_all.R.
Review generated outputs in output/tables, output/figures, and output/derived.

The codebase supports two explicit execution modes:

development mode is the default and is intended for fast verification of the pipeline. It currently uses 20 replications, 199 bootstrap resamples, and 2000 truth draws. These committed outputs are starter artefacts, not manuscript-scale results.
manuscript mode is intended for substantive reporting. It defaults to 2000 replications, 1999 bootstrap resamples, and 10000 truth draws. In manuscript mode, the setup script enforces a minimum of 999 bootstrap replications.

Environment variables that control a run:

SMALL_SAMPLE_RUN_MODE
SMALL_SAMPLE_N_REPS
SMALL_SAMPLE_BOOT_REPS
SMALL_SAMPLE_TRUTH_DRAWS
SMALL_SAMPLE_SEED
SMALL_SAMPLE_BF_THRESHOLD
SMALL_SAMPLE_BF_SENSITIVITY_THRESHOLD

Dependency snapshots are intentionally separate from the main pipeline. If package versions were changed deliberately, run Rscript scripts/98_snapshot_environment.R to update renv.lock.

Outputs

The workflow is designed to produce:

scenario-level simulation summaries as CSV tables
reproducible RDS objects containing raw and aggregated simulation results
truth tables that document the observed-scale estimands under each scenario
Bayes-factor threshold sensitivity tables for thresholds of 3 and 10
comparison figures in PNG and PDF formats
execution logs for transparent run tracking

Reproducibility Statement

This repository is structured as a reproducible research project and was developed against R 4.5.3. All simulations, summaries, and figures are generated from source scripts. Project dependencies are managed with renv, paths are project-relative, and random seeds are set explicitly for the simulation and bootstrap components.

The current workflow requests Bayes factors only and does not request posterior samples from BayesFactor. In the package version recorded by renv.lock, ttestBF() computes Bayes factors by Gaussian quadrature and correlationBF() computes Bayes factors through deterministic numerical routines when posterior = FALSE. Reproducibility therefore depends on fixed package versions and numerical libraries rather than on MCMC output.

Planned Journal Submission

The repository is being developed as the computational companion to a methods paper on statistical decision-making for small-sample survey studies. The intended manuscript will report the simulation design, performance criteria, Monte Carlo precision, practical decision rules, and an applied validation phase using empirical survey data.

Citation Placeholder

Formal citation metadata will be added upon preprint release or journal submission. Until then, please cite the repository by title and URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small-Sample Survey Research: Simulation-Validated Decision Framework

Project Overview

Research Problem: Misuse of Large-Sample Methods in Small Samples

Why This Matters in Survey Research

Methodological Contribution

Repository Structure

How to Run the Project

Outputs

Reproducibility Statement

Planned Journal Submission

Citation Placeholder

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
docs		docs
functions		functions
manuscript		manuscript
output		output
quarto		quarto
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
renv.lock		renv.lock
small-sample-survey-framework.Rproj		small-sample-survey-framework.Rproj

Folders and files

Latest commit

History

Repository files navigation

Small-Sample Survey Research: Simulation-Validated Decision Framework

Project Overview

Research Problem: Misuse of Large-Sample Methods in Small Samples

Why This Matters in Survey Research

Methodological Contribution

Repository Structure

How to Run the Project

Outputs

Reproducibility Statement

Planned Journal Submission

Citation Placeholder

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages