This repository contains the experiment runners, archived result artifacts, and result-processing scripts used for the thesis on Adaptive Episodic Exploration Scheduling (AEES) — a small per-axis bandit that adapts learning-rate multiplier and gradient-noise std per training episode on top of a standard optimizer backbone.
The reusable AEES implementation is not in this repository. It is published on PyPI as pulseopt and maintained at davidkfoss/pulseopt. This repository pulls it in as a normal dependency.
Python 3.11 is required. The commands below use a standard virtual environment; the experiment examples use uv run python, but the same runners can also be launched with python inside an activated environment.
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]pip install -e . declares this directory's dependencies, including pulseopt>=0.1.5; the project itself ships no reusable Python package.
Each runner exposes its full flag set under --help:
uv run python experiments/task_cifar100.py --help
uv run python experiments/task_sst2.py --help
uv run python experiments/task_agnews.py --helpA run writes a RunResult JSON under results/ via experiments/utils/results.py. The runners cache HuggingFace datasets/models under .hf_cache/ and pre-tokenized datasets under data/; both directories are gitignored.
Key flags exposed by all runners, with the full list available through --help:
--lr-candidates,--noise-candidates— comma-separated candidate values; single-candidate axes are treated as fixed constants and skip controller creation.--structured-control-mode {independent,conditional}--context-mode {none,trend}— context modes for contextual controller variants.--episode-length,--reward-instability-lambda,--reward-clip-{min,max}--lr-scheduler {none,cosine,linear,warmup_linear},--scheduler-t-max,--warmup-epochs
CIFAR-specific flags include --label-noise-type {none,symmetric,asymmetric}, --label-noise-rate, and --control-mode {baseline,adaptive,random}. SST-2 and AG News use --method {AdamW,AdaptiveScheduler,RandomScheduler}.
The SST-2 and AG News runners use HuggingFace datasets, tokenizers, and pretrained DistilBERT weights. Dataset/model files are cached under .hf_cache/, while pre-tokenized datasets can be saved under data/.
For runtime measurements, downloads and tokenization should be completed before the measured training run. The recommended pattern is:
- Run
--pretokenize-onlywith--cache-dirand--tokenized-dataset-dirto download/cache the dataset and tokenizer files and save the tokenized dataset. - Run a short non-local smoke job with the same
--cache-dirand--tokenized-dataset-dirto download/cache the pretrained model weights and verify that the runner works end-to-end. - Run the actual measured jobs with the same
--cache-dir,--tokenized-dataset-dir, and--local-files-only.
Example for AG News:
uv run python experiments/task_agnews.py \
--pretokenize-only \
--epochs 1 \
--batch-size 16 \
--max-length 128 \
--lr 5e-5 \
--weight-decay 0.01 \
--cache-dir .hf_cache \
--tokenized-dataset-dir data/agnews_tokenizeduv run python experiments/task_agnews.py \
--method AdamW \
--epochs 1 \
--batch-size 16 \
--max-length 128 \
--lr 5e-5 \
--weight-decay 0.01 \
--lr-scheduler none \
--cache-dir .hf_cache \
--tokenized-dataset-dir data/agnews_tokenized \
--seed 0 \
--output results/agnews_smoke_seed0.jsonuv run python experiments/task_agnews.py \
--method AdamW \
--epochs 1 \
--batch-size 16 \
--max-length 128 \
--lr 5e-5 \
--weight-decay 0.01 \
--lr-scheduler none \
--cache-dir .hf_cache \
--tokenized-dataset-dir data/agnews_tokenized \
--local-files-only \
--seed 0 \
--output results/agnews_local_smoke_seed0.jsonFor SST-2, use the same sequence with experiments/task_sst2.py and --tokenized-dataset-dir data/sst2_tokenized.
This sequence separates one-time dataset download, tokenization, and model-weight download from runtime-sensitive training runs. It also makes repeated experiments faster and avoids unnecessary repeated requests to the HuggingFace Hub. The measured compute-overhead experiments should use the cached HuggingFace files via --cache-dir, the saved tokenized datasets via --tokenized-dataset-dir, and --local-files-only to avoid network access during the run.
Representative clean CIFAR-100 AdamW baseline:
uv run python experiments/task_cifar100.py \
--control-mode baseline \
--optimizer AdamW \
--label-noise-type none \
--seed 0Representative noisy CIFAR-100 AEES-LR run:
uv run python experiments/task_cifar100.py \
--control-mode adaptive \
--optimizer AdamW \
--label-noise-type symmetric \
--label-noise-rate 0.4 \
--episode-length 200 \
--lr-candidates 0.5,1.0,2.0 \
--noise-candidates 0.0 \
--seed 0Representative SST-2 warmup-linear AdamW baseline:
uv run python experiments/task_sst2.py \
--method AdamW \
--lr-scheduler warmup_linear \
--seed 0Representative AG News noise-only AEES run:
uv run python experiments/task_agnews.py \
--method AdaptiveScheduler \
--lr-scheduler warmup_linear \
--lr-candidates 1.0 \
--noise-candidates 0.0,0.005,0.01 \
--label-noise-rate 0.2 \
--episode-length 200 \
--seed 0These commands are representative single-run examples. The thesis tables are generated from archived result artifacts rather than by launching experiments from this README.
scripts/tables/ and scripts/plots/ regenerate the thesis tables and figures by reading archived run-result JSON files. These scripts are read-only consumers: they do not launch training runs. scripts/report_*.py produce summary CSVs over a results directory.
Example help commands:
uv run python scripts/tables/make_cifar_clean_tables.py --help
uv run python scripts/plots/cifar_plots.py --helpThe archived result files are the authoritative source for the numerical tables reported in the thesis. Re-running training with the same seeds should reproduce the same qualitative behavior and similar aggregate results, but exact trajectory-level or bitwise reproduction across machines is not guaranteed.
Tests covering the experiment-side RunResult runtime-metric derivation:
uv run pytest tests/test_runtime_metrics.pyLibrary-side tests for controllers, episode management, rewards, and optimizer wrapping live with the pulseopt source repository.
experiments/task_*.py— three thesis runners: CIFAR-100, SST-2, and AG News.experiments/utils/{flops,metrics,results,run_plan}.py— experiment-side helpers for FLOP accounting,RunResultserialization, JSON IO, and run-plan/manifest utilities.scripts/tables/,scripts/plots/,scripts/report_*.py— reporting scripts over archived result JSON files.tests/test_runtime_metrics.py— runtime-metric derivation tests.data/,.hf_cache/— local dataset/model/tokenization caches, gitignored.results/— local run outputs. Selected compact archived result artifacts used by the thesis are tracked; large temporary outputs, checkpoints, caches, and raw logs are not tracked.
- Fixed run seeds and deterministic label-noise construction are used.
- Seed handling, label-noise protocols, scheduler settings, reward shaping, and gradient-noise generator construction are defined by
pulseoptand the experiment runners. - Newer result files log hardware/software metadata under
runtime_metrics.hardware: GPU name(s) and memory, Python version, PyTorch and CUDA versions, cuDNN version, plus the runner'snum_workersandpin_memorysettings. - Some older archived result files predate that capture and may not contain a complete hardware/software metadata block.
- Exact epoch-level or bitwise reproduction across machines is not guaranteed. GPU architecture, CUDA/cuDNN kernels, PyTorch/torchvision versions, and DataLoader/runtime behavior can introduce small trajectory differences.
- For NLP experiments, tokenized datasets and HuggingFace model/tokenizer files should be cached before runtime-sensitive experiments.
--pretokenize-onlyprepares tokenized datasets, and measured runs should use--cache-dir,--tokenized-dataset-dir, and--local-files-onlyonce the required files are cached. - The
pulseoptdependency is pinned throughpyproject.toml; changing the pinned version changes the AEES implementation backing the runners. - Archived result files are treated as the authoritative source for the thesis tables and figures; the reporting scripts in
scripts/consume them read-only. - Datasets and model/tokenizer caches are pulled into
.hf_cache/anddata/on first run. These directories are gitignored. - Checkpoints, raw logs, and bulk temporary outputs are not intended to be tracked in git.
The reusable pulseopt package contains the public AEES API, quick-start examples, and design notes:
MIT — see LICENSE.