Local-first probabilistic forecasting for PJM day-ahead hourly zonal LMPs.
Implemented and validated through portable tuning package and candidate import validation closeout:
- real AEP single-zone baseline workflow remains reproducible,
- optional MLflow tracking layer remains available (disabled by default),
- real rolling-origin AEP backtest execution completed for TFT and DeepAR across 3 folds,
- calibration diagnostics summarize coverage/width/crossing/bias from rolling forecasts,
- focused TFT/DeepAR search design generates evidence-driven search spaces,
- focused tuning execution supports local resource-safe mode with explicit heavy-run guard,
- portable tuning package export supports
local_safe,cloud_16gb, andcloud_24gbprofiles, - imported ranked candidates are revalidated locally with promotion recompute and mismatch detection,
- full heavy tuning remains deferred on this local machine due to hardware limits.
Current real AEP metrics and smoke tuning diagnostics are workflow evidence, not final benchmark claims.
From existing Step 6 evidence:
- real PJM LMP rows (2024): 8784
- missing hours: 0
- duplicate timestamps: 0
- real weather rows (2024): 8784
- real panel rows after warmup: 8616
Baseline caveat:
- TFT currently outperforms DeepAR on MAE/RMSE in this untuned run.
- TFT coverage_80 is below the desired 70%-90% guide range.
- DeepAR coverage_80 is 0.0 and should be treated as a calibration/config issue, not final model quality.
Dry-run training (no writes):
uv run python -m lmp_forecaster.cli train-single-zone-baselines --zone AEPWrite training without tracking:
uv run python -m lmp_forecaster.cli train-single-zone-baselines \
--zone AEP \
--panel-path data/processed/panel/single_zone/AEP_panel.parquet \
--writeWrite training with MLflow enabled:
uv run python -m lmp_forecaster.cli train-single-zone-baselines \
--zone AEP \
--panel-path data/processed/panel/single_zone/AEP_panel.parquet \
--enable-tracking \
--experiment-name lmp_probabilistic_forecaster_smoke \
--tracking-uri file:./mlruns \
--writeDry-run execution (plan only, no training/writes/tracking):
uv run python -m lmp_forecaster.cli run-rolling-backtest \
--zone AEP \
--panel-path data/processed/panel/single_zone/AEP_panel.parquet \
--folds 3 \
--horizon-hours 24Real write execution:
uv run python -m lmp_forecaster.cli run-rolling-backtest \
--zone AEP \
--panel-path data/processed/panel/single_zone/AEP_panel.parquet \
--folds 3 \
--horizon-hours 24 \
--writeUseful toggles:
--skip-tftor--skip-deeparfor single-model runs.--enable-tracking --tracking-uri <uri> --experiment-name <name>for optional MLflow logging.--max-steps <int>to cap per-fold smoke-safe training steps.
Generated artifacts remain local and ignored by Git:
- backtest forecasts/metrics:
data/cache/backtests/ - reports:
data/cache/reports/ - forecast caches:
data/cache/forecasts/ - processed panel outputs:
data/processed/ - model artifacts:
artifacts/ - MLflow local tracking:
mlruns/ - MLflow scratch artifacts:
.mlflow_artifacts/ - local runtime logs/checkpoints:
lightning_logs/,checkpoints/
Calibration diagnostics dry-run (reads latest rolling outputs, writes nothing):
uv run python -m lmp_forecaster.cli analyze-calibration --zone AEPCalibration diagnostics write-mode:
uv run python -m lmp_forecaster.cli analyze-calibration --zone AEP --writeFocused search design dry-run:
uv run python -m lmp_forecaster.cli design-focused-search --zone AEPFocused search design write-mode:
uv run python -m lmp_forecaster.cli design-focused-search --zone AEP --writeRationale:
- TFT remains under-covered and needs calibration-oriented adjustments.
- DeepAR shows interval collapse behavior and needs targeted distribution/calibration recovery.
- This step produces design artifacts only; it does not run large tuning.
uv run ruff check .
uv run mypy src
uv run pytest -qWhy full tuning is not run locally:
- this workstation is constrained to roughly 8GB VRAM, 16GB RAM, and around 100GB free disk,
- earlier heavier tuning attempts were unstable,
- local execution is intentionally limited to planning and bounded validation.
Resource profiles:
local_safe: tiny local planning/smoke profile,cloud_16gb: intended first external package target,cloud_24gb: larger external search profile.
Export package dry-run (no writes):
uv run python -m lmp_forecaster.cli export-tuning-package \
--zone AEP \
--resource-profile cloud_16gb \
--models TFT,DeepAR \
--max-trials 12 \
--folds 2Export package write:
uv run python -m lmp_forecaster.cli export-tuning-package \
--zone AEP \
--resource-profile cloud_16gb \
--models TFT,DeepAR \
--max-trials 12 \
--folds 2 \
--writeImport external ranked results and recompute promotion locally:
uv run python -m lmp_forecaster.cli import-tuning-results \
--zone AEP \
--ranked-candidates-path <ranked_candidates_csv>Write import validation report:
uv run python -m lmp_forecaster.cli import-tuning-results \
--zone AEP \
--ranked-candidates-path <ranked_candidates_csv> \
--writeImport behavior summary:
- required candidate schema is enforced,
- promotion decisions are recomputed locally from baseline metrics,
- imported promotion labels are treated as advisory only,
- mismatch between imported label and recomputed status is reported,
- under-covered or interval-collapse candidates are rejected.
If no candidate is promoted:
- keep current baseline as active,
- tune externally with stronger profile (
cloud_16gborcloud_24gb), - import new ranked results,
- re-run 3-fold rolling backtest for the best candidate only before any promotion.