Skip to content

unibuc-cs/GASP

Repository files navigation

SmartCity-GASP-MARL

This repository is a compact research demo for Governed Agent Societies in a smart-city setting. It implements a symbolic multi-agent environment, typed action traces, runtime governance guards, deterministic baselines, and small IPPO/MAPPO-style MARL trainers.

The demo is intentionally toy-sized. It is not a realistic city simulator. It is designed to make the following paper concepts executable and inspectable:

  • multi-role execution under partial observations;
  • scenario-based role activation;
  • typed action records instead of free-form control only;
  • governance-aware transition dynamics;
  • JSONL execution traces;
  • trace-centric metrics such as TSC, EscCE, GAU, activation precision/recall, and a simple RDC proxy;
  • deterministic and learned multi-agent policy modes.

Project structure

smartcity_gasp_marl/
  smartcity_gasp/
    core/          # state, scenarios, typed actions, governance, metrics, traces
    envs/          # PettingZoo-like parallel MARL environment
    policies/      # deterministic and random policies
    marl/          # compact PPO, IPPO, and MAPPO-style training utilities
    experiments/   # runnable scripts
  configs/         # example experiment configuration files
  docs/            # algorithm alignment and paper integration notes
  outputs/         # generated sample outputs
  tests/           # lightweight smoke tests

Installation

The deterministic environment requires only Python and NumPy.

python -m pip install -r requirements.txt

For MARL training, install PyTorch as appropriate for your machine. The code will still run deterministic baselines if PyTorch is not available.

Quick start

Run the deterministic baselines:

python -m smartcity_gasp.experiments.run_deterministic_baselines --episodes 20

Run a short MAPPO smoke training session:

python -m smartcity_gasp.experiments.train_marl --algorithm mappo --episodes 20 --eval-episodes 10

Run the governed MAPPO variant:

python -m smartcity_gasp.experiments.train_marl --algorithm mappo --governed --episodes 20 --eval-episodes 10

Run the compact end-to-end smoke suite:

python -m smartcity_gasp.experiments.evaluate_all --quick

Execution modes

The deterministic experiment compares four modes:

Mode Meaning
B0 direct controller A single direct controller acts without role activation or guard.
B1 all agents, no guard All service agents act in every scenario; no governance guard.
B2 activated, no guard Only scenario-relevant agents act; no governance guard.
B3 activated, governed Scenario-relevant agents act; high-impact actions pass through the guard.

The MARL experiment supports:

Mode Meaning
IPPO unguarded Independent PPO-style actor/critic over local observations.
MAPPO unguarded Local actor with centralized critic over global state.
MAPPO governed Same centralized-critic setup, but actions pass through the guard.

Smart-city scenarios

Episodes sample one of four symbolic incident families:

  1. traffic accident near a hospital;
  2. power outage in a critical district;
  3. flooded underpass;
  4. pollution spike near a school.

Each scenario has severity, evidence quality, congestion, hospital-access risk, pollution-zone status, power/water status, overseer availability, and a required role set.

Typed actions

RL policies output integer actions. The environment converts each integer into a typed action record before guard evaluation. A trace entry therefore contains fields such as:

{
  "role": "TrafficAgent",
  "action_type": "open_bus_lane",
  "target": "hospital_route",
  "evidence_refs": ["verified_incident_report"],
  "risk_level": "high",
  "escalation_requested": false,
  "p_escalation": 0.65
}

Governance rules

The current guard implements simple, inspectable rules:

  • public alerts require verified evidence;
  • bus-lane opening for longer than five minutes requires approval;
  • road closure affecting hospital access requires escalation;
  • memory writes require a source and an expiration condition;
  • pollution-zone rerouting is sanitized unless emergency priority is active;
  • inactive service roles are denied in governed execution.

Outputs

Generated outputs are written under outputs/results/:

deterministic_results.csv
deterministic_summary.md
deterministic_table.tex
example_trace.json
traces/*.jsonl
*_learning_curve.csv
*_results.csv
*_summary.md
*_table.tex

The JSONL traces are the most useful artifact for the paper: they show proposed actions, evidence links, guard outcomes, violations, support annotations, and reward signals.

Notes for paper use

The code implements the governed simulation and trace-evaluation parts of the paper algorithm directly. The learning layer provides IPPO and MAPPO-style policy training. The centralized critic uses global state, while actors act from local observations. The verifier/governor are rule-based runtime guards in this version, which keeps governance inspectable and avoids learning a black-box safety policy before the trace metrics are validated.

See docs/algorithm_alignment.md and docs/paper_integration_notes.md for the precise mapping between the code and the paper sections.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors