OrderedLearning

An experiment framework for studying how data ordering strategies affect neural network learning. Companion code for the paper: The Order Is The Message.

Overview

OrderedLearning provides a unified system for running ordered dataset learning experiments with:

Standardized training loops -- experiments define building blocks; the framework handles orchestration
19 instrumentation hooks for observing and intervening in training, with 4 sink types (console, CSV, JSONL, W&B)
Rich console output with full-screen live display, silent, and headless modes
Strict reproducibility -- deterministic seeding, environment metadata capture, config snapshots

The primary experiment is mod_arithmetic, which trains a small transformer on modular addition (a + b) mod p and studies how data ordering affects the grokking phenomenon -- the delayed generalization that emerges long after memorization.

Installation

git clone https://github.com/JordanRL/OrderedLearning.git
cd OrderedLearning

# To reproduce paper results, use the exact code version:
git checkout paper-data-v3

# For general use (allows compatible newer versions):
pip install -r requirements.txt

# For exact reproducibility (pinned versions matching the paper):
pip install -r requirements-lock.txt

requirements.txt uses >= minimum bounds for general compatibility. requirements-lock.txt pins exact versions for bit-identical reproducibility across environments. Use the lock file when replicating paper results; use the regular file for development or when running on newer hardware that may need updated packages.

Requires Python 3.11+ and PyTorch 2.0+. The mod_arithmetic experiment can run on CPU, though it was run on a NVidia 4090 GPU on RunPod for the paper; LM experiments require a CUDA-capable GPU. Apple Silicon (MPS) is supported as a fallback when CUDA is unavailable.

Replicating the Paper Experiment

The mod_arithmetic experiment is self-contained -- no external data or GPU required. This trains a 2-layer transformer on (a + b) mod 9973 using all four data ordering strategies (stride, random, fixed-random, target) and reports grokking behavior for each.

Important: To reproduce paper results, make sure you are on the paper-data-v3 tag (git checkout paper-data-v3). Later commits may change framework behavior.

Exact Commands Used For Paper Results

First, these commands were run to generate the solution weights for each strategy:

python run_experiment.py mod_arithmetic --strategy stride --with-hooks minimal --seed 199 --save-checkpoints
python run_experiment.py mod_arithmetic --strategy random --with-hooks minimal --seed 199 --save-checkpoints
python run_experiment.py mod_arithmetic --strategy target --with-hooks minimal --seed 199 --save-checkpoints
python run_experiment.py mod_arithmetic --strategy fixed-random --with-hooks minimal --seed 199 --save-checkpoints

Reference weights for solution-dependent metrics are resolved automatically from the output directory (e.g., output/mod_arithmetic/stride/stride_final.pt).

Then the fully instrumented experiment was run on the same pod:

python run_experiment.py mod_arithmetic --strategy stride --with-hooks full --seed 199 --hook-jsonl --validate-checkpoints
python run_experiment.py mod_arithmetic --strategy random --with-hooks full --seed 199 --hook-jsonl --validate-checkpoints
python run_experiment.py mod_arithmetic --strategy target --with-hooks full --seed 199 --hook-jsonl --validate-checkpoints
python run_experiment.py mod_arithmetic --strategy fixed-random --with-hooks full --seed 199 --hook-jsonl --validate-checkpoints

Key configuration flags

Flag	Default	Description
`--strategy`	`all`	Ordering strategy: `stride`, `random`, `fixed-random`, `target`, `all`
`--epochs`	`5000`	Total training epochs
`--p`	`9973`	Prime modulus
`--batch-size`	`256`	Batch size
`--seed`	`199`	Random seed
`--stride`	`floor(sqrt(p))`	Stride value for `stride` ordering
`--snapshot-every`	`10`	Interval for hook snapshots

With full instrumentation

# Enable all hooks and write metrics to JSONL
python run_experiment.py mod_arithmetic --with-hooks full --hook-jsonl

# Full instrumentation with W&B logging
python run_experiment.py mod_arithmetic --with-hooks full --hook-wandb my_project

# Full instrumentation with live display
python run_experiment.py mod_arithmetic --strategy stride --with-hooks full --hook-jsonl --live

Expected output

Each strategy run produces files in output/mod_arithmetic/{strategy}/:

File	Description
`experiment_config.json`	Full config + environment metadata
`summary.json`	Initial and final evaluation metrics
`{strategy}_final.pt`	Final model weights
`{strategy}.csv`	Hook metrics time series (with `--hook-csv`)
`{strategy}.jsonl`	Hook metrics in JSONL (with `--hook-jsonl`)
`checkpoints/`	Periodic checkpoints (with `--save-checkpoints`)
`traj.pt`	Parameter trajectory (with `--record-trajectory`)

Reproducibility and Determinism

A single --seed value controls all random number generators:

Python random, NumPy, PyTorch CPU and CUDA RNGs are all seeded
torch.use_deterministic_algorithms(True) is enforced (not warn_only)
cuDNN: deterministic=True, benchmark=False
cuBLAS: workspace config set to :4096:8 for deterministic reductions
Flash attention and memory-efficient attention are disabled (non-deterministic)
Hook RNG state is saved and restored around hook execution to prevent hooks from perturbing training

Each run captures full environment metadata in experiment_config.json:

{
  "environment": {
    "torch_version": "2.x.x",
    "cuda_version": "12.x",
    "cudnn_version": "...",
    "gpu_name": "NVIDIA ...",
    "gpu_capability": "8.9",
    "float32_matmul_precision": "high",
    "cudnn_deterministic": true,
    "cudnn_benchmark": false,
    "cublas_workspace_config": ":4096:8"
  }
}

Guarantee: Same seed + same hardware + same code = bit-identical results.

What can differ: Different GPU architectures, different CUDA/cuDNN versions, or different PyTorch versions may produce numerically different results due to implementation differences in low-level kernels. The framework warns when loading reference weights from a different environment.

Running Your Own Experiments

The scaffold generator creates a new experiment package with all required files:

python create_experiment.py

This walks you through experiment setup (base class, strategies, training step type, hyperparameters) and generates a package under experiments/ with stub implementations ready to fill in.

Experiment package structure

experiments/{name}/
├── __init__.py       # imports runner to trigger @ExperimentRegistry.register
├── runner.py         # runner class extending LMRunner or GrokkingRunner
├── config.py         # @dataclass extending BaseConfig
├── generator.py      # DatasetGenerator subclass
└── loader.py         # DatasetLoader subclass (if needed)

Experiments are auto-discovered -- any package under experiments/ that imports a registered runner class will appear in python run_experiment.py --list.

Instrumentation Hooks

The framework includes 19 training hooks for deep observability into the learning process. Hooks fire at lifecycle points during training, compute metrics, and dispatch them to configurable sinks.

# Enable a curated hook group
python run_experiment.py mod_arithmetic --with-hooks full --hook-jsonl

# Enable specific hooks
python run_experiment.py mod_arithmetic --hooks norms fourier attention --hook-jsonl

# List all available hooks
python run_experiment.py mod_arithmetic --hooks-list

# Describe a hook's metrics
python run_experiment.py mod_arithmetic --hooks-describe fourier

See docs/instrumentation-hooks.md for the full hook reference.

Analysis Tools

Post-experiment analysis tools are provided for visualizing metrics, comparing strategies, and exporting publication-ready tables. See docs/getting-started.md for a walkthrough.

# List available analysis tools
python analyze_experiment.py --list

# Plot training metrics
python analyze_experiment.py mod_arithmetic metric_plot \
    --metrics training_metrics/loss training_metrics/val_acc \
    --layout overlay --smooth 0.9

# Export a LaTeX comparison table
python analyze_experiment.py mod_arithmetic export_table \
    --metrics training_metrics/val_acc --format latex

Standalone Scripts

Replay to W&B — analysis_tools/replay_to_wandb.py reads JSONL metric logs and uploads them to Weights & Biases without re-running the experiment.

python -m analysis_tools.replay_to_wandb \
    --jsonl output/mod_arithmetic/stride/stride.jsonl \
    --project my-project --group experiment-group --strategy stride \
    --config output/mod_arithmetic/stride/experiment_config.json

Dataset DFT Analysis — analysis_tools/dataset_dft_analysis.py performs spectral analysis of mod_arithmetic dataset orderings, comparing frequency content across data ordering strategies.

python -m analysis_tools.dataset_dft_analysis --p 9973 --train-size 300000 --seed 42

Console Modes

Flag	Mode	Behavior
(default)	NORMAL	Standard scrolling Rich output
`--live`	LIVE	Full-screen layout with real-time metrics sidebar
`--silent`	SILENT	Progress bar only; all other output suppressed
`--no-console-output`	NULL	No console output at all

Priority: When multiple flags are given, the highest-priority mode wins: NULL (--no-console-output) > SILENT (--silent) > LIVE (--live) > NORMAL (default). For example, --live --silent produces SILENT mode.

Security Note

Checkpoint files (.pt) are loaded with torch.load(weights_only=False) because they contain optimizer state dicts, RNG states, and experiment-specific training state that require unpickling. Only load checkpoints that you have generated yourself or that come from a trusted source. Loading an untrusted checkpoint file can execute arbitrary code. See the PyTorch serialization docs for details.

Docker

A Dockerfile is provided for GPU cloud environments (tested on RunPod). Dependencies are baked into the image; code is pulled at startup via entrypoint.sh.

# Build
DOCKER_IMAGE_NAME=your-user/orderedlearning ./build-docker.sh

# Or build directly
docker build -t orderedlearning .

Environment variables for entrypoint.sh:

Variable	Default	Description
`REPO_URL`	(must be set)	GitHub repo URL without protocol (e.g., `github.com/user/OrderedLearning.git`)
`REPO_BRANCH`	`master`	Branch to clone/pull
`GITHUB_TOKEN`	(optional)	GitHub PAT for private repos
`WORKSPACE_DIR`	`/workspace/OrderedLearning`	Local clone directory

License

MIT -- see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OrderedLearning

Overview

Installation

Replicating the Paper Experiment

Exact Commands Used For Paper Results

Key configuration flags

With full instrumentation

Expected output

Reproducibility and Determinism

Running Your Own Experiments

Experiment package structure

Instrumentation Hooks

Analysis Tools

Standalone Scripts

Console Modes

Security Note

Docker

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
analysis_tools		analysis_tools
console		console
docs		docs
examples		examples
experiments		experiments
framework		framework
tests		tests
training_hooks		training_hooks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
analyze_experiment.py		analyze_experiment.py
build-docker.sh		build-docker.sh
create_experiment.py		create_experiment.py
entrypoint.sh		entrypoint.sh
pyproject.toml		pyproject.toml
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py

Folders and files

Latest commit

History

Repository files navigation

OrderedLearning

Overview

Installation

Replicating the Paper Experiment

Exact Commands Used For Paper Results

Key configuration flags

With full instrumentation

Expected output

Reproducibility and Determinism

Running Your Own Experiments

Experiment package structure

Instrumentation Hooks

Analysis Tools

Standalone Scripts

Console Modes

Security Note

Docker

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages