Retrospective shadow optimization for GPU infrastructure conversations.
Blackout answers:
Based on historical workload logs, what would Blackout have recommended, how much might it have saved, how trustworthy is the data, and what should we pilot first?
Shadow mode only. No jobs are moved. The app uses uploaded customer data and policy settings to produce an offline report for review. It is not live scheduling, billing, carbon accounting, or production automation.
- What it does: runs a retrospective shadow diagnostic on uploaded GPU workload, region, and policy data.
- Web app: run
npm run dev, then openhttp://127.0.0.1:5173. - CLI report: run
npm run report -- --workloads fixtures/normal-week/workloads.csv --regions fixtures/normal-week/regions.csv --policy fixtures/normal-week/policy.json --out reports/normal-week. - Demo reports: run
npm run demo:reportsto generatereports/demo/normal-week/,reports/demo/grid-stress-week/, andreports/demo/policy-heavy-week/. - Design-partner packet: see design-partner/overview.md.
- Data templates: see templates/.
- Deployment prep: see docs/deployment.md.
- Current boundaries: no live scheduling, forecasting, cloud integration, auth, billing, database, marketplace, Kubernetes controller, Ray integration, or Slurm integration.
npm install
npm run dev
npm run report -- --workloads fixtures/normal-week/workloads.csv --regions fixtures/normal-week/regions.csv --policy fixtures/normal-week/policy.json --out reports/normal-week
npm run demo:reports
npm test
npm run build
npm startDevelopment runs:
- Web UI:
http://127.0.0.1:5173 - API:
http://127.0.0.1:3001
npm start runs the API server. After npm run build, the API also serves the built web UI from dist-web.
Use npm run dev during development. Upload workload and region CSV files, configure policy, and run the retrospective report from the browser. The app does not load fixture data automatically.
Shadow mode uses historical workload logs and uploaded region data to estimate what Blackout would have recommended.
- No jobs are moved.
- No production schedulers are called.
- No cloud inventory is queried.
- No capacity is reserved outside the report.
- Recommendations must be reviewed before production use.
- Cost and carbon numbers are estimates from uploaded inputs.
The report is for decision support, not exact billing.
The diagnostic report is the customer-facing version of the retrospective report. It is meant for founder-led discovery calls and pilot scoping, not for production automation.
It includes:
- executive summary
- workload flexibility summary
- estimated savings range
- estimated carbon impact
- data quality score
- top blockers
- top movable workload types
- top pinned workload types
- top opportunities
- policy constraints applied
- assumptions used
- savings not counted
- recommended pilot scope
- recommended next step
The Markdown export is designed to be readable as a customer artifact. It uses direct language such as estimated, based on uploaded data, and requires validation against actual billing.
- Upload a workload CSV from the period you want to review.
- Upload a region CSV with electricity price, carbon intensity, GPU capacity, grid stress, and optional PUE/latency/reliability data.
- Configure policy constraints in the UI or upload a
policy.jsonthrough the API. - Click
Run retrospective report. - Review the batch estimate, diagnostic report, recommendation mix, confidence counts, priority breakdown, blockers, savings breakdowns, and workload rows.
- Export JSON, workload CSV, or diagnostic Markdown for the customer conversation.
The UI does not silently load demo data. Fixture files must be uploaded manually.
Run the diagnostic flow without the web server:
npm run report -- --workloads fixtures/normal-week/workloads.csv --regions fixtures/normal-week/regions.csv --policy fixtures/normal-week/policy.json --out reports/normal-weekOutputs:
report.jsonrecommendations.csvdiagnostic.md
The CLI creates the output directory if it is missing. It fails with a nonzero exit code when an input path is missing, a CSV cannot be parsed, policy JSON is invalid, or the files do not validate together.
Generate all fixture demo reports:
npm run demo:reportsOutputs are written under reports/demo/. The sample output index lives at design-partner/sample_outputs.md.
Optional assumptions:
npm run report -- --workloads workloads.csv --regions regions.csv --policy policy.json --out reports/customer-a --gpu-kwh-assumption 0.8 --default-pue 1.18Required headers:
id,workload_type,gpu_type,gpu_count,expected_duration_minutes,current_region,priority,latency_sensitive,can_delay,can_move,checkpointableOptional headers:
customer_id,deadline_minutes_from_now,allowed_regions,max_latency_ms,data_residency_region,estimated_revenue_usdRules:
gpu_countmust be an integer greater than0.expected_duration_minutesmust be greater than0.deadline_minutes_from_now,max_latency_ms, andestimated_revenue_usdmust be>= 0when present.prioritymust below,normal,high, orcritical.- Boolean fields must be
trueorfalse. allowed_regionsuses pipe-separated region names, for exampleus-east-1|us-west-2.- Duplicate workload IDs are rejected.
max_latency_msis required whenlatency_sensitive=true.
Required headers:
region,electricity_price_per_kwh,carbon_intensity_g_per_kwh,gpu_available,grid_stressOptional headers:
pue,avg_latency_ms,reliability_scoreRules:
electricity_price_per_kwhandcarbon_intensity_g_per_kwhmust be>= 0.gpu_availablemust be an integer>= 0.grid_stressmust below,medium, orhigh.puemust be>= 1when present.avg_latency_msmust be>= 0when present.reliability_scoremust be from0to1when present.
The UI sends this policy shape:
{
"max_delay_minutes": 60,
"allowed_regions": [],
"blocked_regions": [],
"carbon_ceiling_g_per_kwh": null,
"max_latency_ms": null,
"require_manual_for_low_confidence": true
}Empty allowed_regions means every uploaded region is eligible unless blocked by another rule.
Invalid policy combinations are reported in validation_errors. Examples include a region being both allowed and blocked, negative limits, unknown region references, or a policy that leaves no usable region.
Templates live under templates/:
templates/workloads_template.csvtemplates/regions_template.csvtemplates/policy_template.jsontemplates/customer_data_request.md
The CSV parser does not support comments, so the CSV templates include one example row. Remove the example row before sending real data.
Data can be anonymized:
customer_idcan be hashed.- Workload IDs can be random.
- Exact revenue can be omitted.
- Workload type names can be coarse.
- Region names should remain real enough to map to energy and latency data.
Do not include secrets, credentials, customer names, model contents, prompts, training data, source code, private URLs, or production scheduler tokens.
Capacity is reserved within a single batch report. The optimizer does not solve global bin packing.
Processing order is deterministic:
criticalhighnormallow
Within the same priority, input order is stable.
When a workload is assigned to move_region, run_now, or pinned, the workload's gpu_count is subtracted from that region's remaining capacity. delay, manual_review, and invalid rows do not reserve capacity because they are not automatic assignments.
Rows expose capacity evidence:
capacity_checkedcapacity_reservedremaining_region_capacity_after_assignmentcapacity_reason
Example reason:
Cannot move because us-west-2 does not have enough remaining GPU capacity.
The estimate is deliberately simple:
estimated_kwh = gpu_count * expected_duration_hours * gpu_kwh_assumption * pue
cost_usd = estimated_kwh * electricity_price_per_kwh
carbon_g = estimated_kwh * carbon_intensity_g_per_kwh
Defaults:
gpu_kwh_assumption = 0.7 kWh per GPU-hour
default_pue = 1.2
If a region row has pue, that row uses the uploaded value. If pue is missing, the row uses default_pue.
Estimated savings count only valid automatic move_region recommendations that reduce estimated dollar cost. The report does not count delayed workloads without forecast data, manual review opportunities, pinned workloads, invalid rows, or moves that do not reduce estimated cost.
The diagnostic report shows a rough planning range:
low = expected * 0.5
expected = current estimated savings
high = expected * 1.25
Values are clamped at 0. This range is not billing truth. It is a conversation aid that must be validated against actual cloud and utility bills.
Data quality is scored as high, medium, or low with a numeric score from 0 to 100.
The score considers:
- missing deadlines
- missing
allowed_regions - unknown or invalid region references
- invalid workload rows
- missing PUE
- missing latency data
- missing reliability scores
- missing estimated revenue
- empty or broad policy constraints
- missing carbon ceiling
- missing max latency policy
Examples:
14% of workloads are missing deadlines.
PUE default was used for 2 regions.
Reliability score missing for all regions.
Low data quality does not mean the customer did anything wrong. It means the report is less useful for production decisions.
The diagnostic report recommends a pilot only when the uploaded data supports one.
The logic prefers workload types with:
- counted estimated savings
- medium or high confidence
- low blocker count
- non-critical priority
- no risky latency sensitivity
It avoids critical workloads and latency-sensitive workloads unless the report has safe options. It prefers regions that appear often in successful recommendations. If the data is too weak, it says no pilot is recommended yet and explains why.
Pilot output includes recommended workload types, regions, excluded workload types, excluded priority levels, duration, success metric, reason, and risks to watch.
The report explicitly lists what is not counted:
- delay savings without forecast data
- demand response revenue
- utility bill reconciliation
- live cloud inventory
- real capacity reservations
- customer contract pricing unless reflected in uploaded prices
- cooling estimates beyond PUE assumptions
Top-level report fields:
generated_atraw_policyassumptionsworkload_input_summaryregion_input_summarysummarysavings_rangedata_qualitypilot_recommendationnot_counted_savingsdiagnosticaggregate_report_summarybreakdownsrecommendationsrowsvalidation_errors
Summary includes:
- total, valid, and invalid workloads
run_now,delay,move_region,manual_review, andpinnedcounts- estimated baseline cost, recommended cost, savings, and savings percentage
- estimated baseline carbon, recommended carbon, and carbon delta
- movable and pinned percentages
- capacity, latency, policy, and data residency blocker counts
Breakdowns include:
- savings by workload type
- savings by current region
- savings by recommended region
- recommendation counts by type
- recommendation counts by priority
- confidence counts
- top reasons workloads could not move
- top estimated savings opportunities
- workloads excluded from estimated savings
The full JSON export is the RetrospectiveReport object and includes raw policy, assumptions, input summaries, per-workload recommendations, aggregate summary, validation errors, diagnostic summary, data quality, savings range, pilot recommendation, not-counted items, and generated_at.
The workload CSV export includes:
id,workload_type,priority,current_region,recommended_region,recommendation,confidence,reason,baseline_cost,recommended_cost,estimated_savings,baseline_carbon,recommended_carbon,carbon_delta,delay_minutes,valid,validation_errorsInvalid workloads appear in exports with valid=false and row validation messages.
The diagnostic Markdown export includes:
- executive summary
- workload flexibility
- estimated savings range
- data quality
- top blockers
- best pilot candidate
- assumptions
- what the report does not claim
- recommended next step
POST /api/report/retrospective
Multipart fields:
workloads: workload CSV fileregions: region CSV filepolicy: JSON policy stringgpu_kwh_assumption: optional numberdefault_pue: optional number
JSON input is also supported:
{
"workloads": [],
"regions": [],
"policy": {},
"gpu_kwh_assumption": 0.7,
"default_pue": 1.2
}POST /api/export/report/workloads.csv
Body:
{
"rows": []
}Returns one CSV row per workload recommendation.
POST /api/export/report/diagnostic.md
Body:
{
"report": {}
}Returns the diagnostic Markdown report generated from the report object.
The older POST /api/optimize endpoint remains for the original recommendation flow and now reserves capacity across its batch.
Fixture sets live under fixtures/. Upload their files manually through the UI or use them in tests.
fixtures/normal-week/- mixed workload types
- moderate prices and savings
- some pinned workloads
fixtures/grid-stress-week/- high electricity price, carbon, and grid stress in one region
- move and manual review recommendations
- enough workloads to show capacity reservation
fixtures/policy-heavy-week/- blocked regions
- data residency
- latency-sensitive jobs
- critical priority jobs
- policy preventing unsafe moves
Each set contains:
workloads.csvregions.csvpolicy.jsonexpected-notes.md
The old root-level fixtures/workloads.csv and fixtures/regions.csv remain as small smoke-test inputs.
- No live cloud inventory.
- No forecast integration.
- No forecast-based delay savings.
- No live scheduling.
- No automatic production scheduling.
- No live migration.
- No cloud provider integration.
- No utility bill reconciliation.
- No demand response revenue.
- No energy procurement.
- Cost and carbon are estimates.
- Cost and carbon estimates depend on uploaded data quality.
- Capacity reservation is simple and batch-local.
- No Kubernetes controller.
- No Ray integration.
- No Slurm integration.
- No marketplace, auth, or billing features.