Skip to content
77 changes: 77 additions & 0 deletions rust-executor/crates/holograph/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# `holograph`

AD4M's Kitsune2-backed substrate for the perspective-diff-sync DAG —
the v1 Holograph spike (see SPIKE.md). Replaces a Holochain conductor
with a sled-backed `KvOpStore` + `HolographIntegrationQueue` +
`HolographSpace`, driving the same `Workspace` / `Snapshot` algorithm
crate that the HDK retriever does.

This crate is part of the four-PR Holograph stack:

- **PR-A** — algorithm crate extraction (substrate-neutral DAG ops)
- **PR-B** — this crate
- **PR-C** — AD4M `holograph-link` Language + JS wires
- **PR-D** — production polish (sled recovery, fetch fallback,
graceful shutdown, restart-survives, iroh relay env hook)

## Configuration

Per-space behaviour is set via `SpaceConfig`
(see `src/config.rs`). The v1 default is
`SpaceConfig::full_replication_single_doc()` —
every node holds every op, single shared document, 5s gossip cadence.
Tests and v1.5 sharded deployments build their own configs.

### Environment variables

Wake-18 D6 surfaces a small set of env-driven overrides for
deployment-time tuning. All are optional; unset means "use the
hard-coded default."

| Variable | Default | Purpose |
|---|---|---|
| `HOLOGRAPH_IROH_RELAY` | none | Iroh relay URL for cross-process transport. Resolved into `SpaceConfig.iroh_relay_url` by `HolographSpace::new` when the config field is `None`. Empty / whitespace-only is treated as unset. See `resolve_iroh_relay()`. |
| `HOLOGRAPH_IROH_RELAY_URL` | none | Older alias for `HOLOGRAPH_IROH_RELAY`. Used as a back-compat fallback (existing wiring in `holograph_wires.rs` reads this). New deployments should prefer the shorter name. |
| `HOLOGRAPH_IROH_PLAINTEXT` | `0` | Permit plain-text (`ws://`) relay connections. Spike-only; production should use TLS (`wss://`). |
| `HOLOGRAPH_BOOTSTRAP_URL` | derived from relay | Bootstrap server URL for `CoreBootstrap`. Defaults to the relay URL with any `/relay` suffix stripped (matches `kitsune2-bootstrap-srv`'s pattern). |
| `HOLOGRAPH_BOOTSTRAP_BACKOFF_MIN_MS` | `500` | Minimum re-bootstrap interval. Spike tightens K2's default (5000ms) so two-conductor convergence fits within the 15s test deadline. |

### Programmatic overrides

`SpaceConfig.fetch_fallback_policy: FetchFallbackPolicy` lifts the
multi-peer fetch-fallback knobs into one structured policy:

```rust
FetchFallbackPolicy {
initial_timeout: Duration::from_secs(5), // grace before fallback
max_attempts: 3, // peer cap (lifetime)
retry_budget: Duration::from_secs(30), // wall-clock cap
}
```

When either cap is hit, the pending entry is dropped and
`NotifyUp::notify_parent_fetch_permanent_failure` fires so upstream
layers can surface a "given up" signal.

## Lifecycle

`HolographSpace` is the top-level handle. It's `Arc`-wrapped and shared
across the K2 stack + AD4M language wires:

```rust
let space: Arc<HolographSpace> = HolographSpace::new(cfg);
// ... use it ...
let remaining = space.shutdown().await?; // graceful drain + flush
```

`shutdown()`:

1. Sets a flag that makes `on_local_commit` reject new commits.
2. Stops the integration queue's fallback watcher.
3. Drains the queue (10s timeout).
4. Flushes the sled DB so the snapshot is durable.
5. Closes the `LocalCommitTarget` (transport teardown).

`Drop for HolographSpace` is the safety net for "process exit before
shutdown was called" — best-effort synchronous flush, logged on error,
never panics.
146 changes: 145 additions & 1 deletion rust-executor/crates/holograph/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
//! 6. `HolographSpace` accepts a `SpaceConfig` (this struct) with arc policy
//! + loc_fn + validation regime.

use std::time::Duration;

use kitsune2_api::DhtArc;
use serde::{Deserialize, Serialize};

Expand Down Expand Up @@ -69,6 +71,42 @@ pub enum ValidationRegime {
SignatureAndParentsOnly,
}

/// Policy for how the integration queue falls back to alternative peers
/// when the authoring peer goes silent before delivering a missing
/// parent op.
///
/// Wake-18 D2: lifts the previously-implicit constants
/// (`fallback_timeout` + `max_retry_peers`) into one structured policy
/// and adds a wall-clock retry budget so a long-tail failure on one
/// pending entry can't pin the watcher forever.
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
pub struct FetchFallbackPolicy {
/// How old a pending entry must be before the watcher even
/// considers re-requesting it from an alternative peer.
/// Gives the original source a chance to deliver before we widen
/// the search.
pub initial_timeout: Duration,
/// Maximum number of distinct peers to round-robin through before
/// declaring permanent failure (see Wake-18 D5). Counted across
/// the entry's full lifetime, not per tick.
pub max_attempts: u8,
/// Total wall-clock budget from `first_seen` to "give up." Once
/// exceeded the entry is dropped with a permanent-failure event
/// even if `max_attempts` hasn't been hit. Keeps absurdly-long
/// fetch retries bounded.
pub retry_budget: Duration,
}

impl Default for FetchFallbackPolicy {
fn default() -> Self {
Self {
initial_timeout: Duration::from_secs(5),
max_attempts: 3,
retry_budget: Duration::from_secs(30),
}
}
}

/// Per-space configuration for a Holograph space.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct SpaceConfig {
Expand All @@ -78,17 +116,62 @@ pub struct SpaceConfig {
/// Override for K2's gossip-initiation cadence. None means use K2's
/// default (~120s). v1 spike uses 5_000ms — see SPIKE §1.1.
pub gossip_initiate_interval_ms: Option<u32>,
/// How the integration queue handles missing-parent fetches when
/// the authoring peer goes silent. v1 default is 5s/3-peers/30s
/// (see `FetchFallbackPolicy::default`).
pub fetch_fallback_policy: FetchFallbackPolicy,
/// Optional iroh relay URL override (Wake-18 D6). When `Some`,
/// holograph passes it to `IrohTransportFactory` via the
/// `IrohTransportModConfig.relay_url` slot. When `None` (the
/// default), the K2 `transport_iroh` factory picks its own relay.
///
/// `HolographSpace::new` resolves this lazily from the
/// `HOLOGRAPH_IROH_RELAY` env var (preferred) or
/// `HOLOGRAPH_IROH_RELAY_URL` (back-compat alias) if the field is
/// `None` — see [`resolve_iroh_relay`].
#[serde(default)]
pub iroh_relay_url: Option<String>,
}

/// Read the iroh relay URL from the process environment.
///
/// Wake-18 D6: surfaces the relay override as a structured config
/// knob. Checks `HOLOGRAPH_IROH_RELAY` first (the canonical name
/// going forward), then `HOLOGRAPH_IROH_RELAY_URL` (the older name
/// used inside `holograph_wires.rs`). Empty strings are treated as
/// unset.
pub fn resolve_iroh_relay() -> Option<String> {
fn nonempty(v: String) -> Option<String> {
let t = v.trim();
if t.is_empty() {
None
} else {
Some(t.to_string())
}
}
std::env::var("HOLOGRAPH_IROH_RELAY")
.ok()
.and_then(nonempty)
.or_else(|| {
std::env::var("HOLOGRAPH_IROH_RELAY_URL")
.ok()
.and_then(nonempty)
})
}

impl SpaceConfig {
/// The v1 default — full arc, single-doc, signature+parent validation,
/// 5s gossip cadence.
/// 5s gossip cadence, default 5s/3-peers/30s fetch fallback, no
/// pre-set iroh relay URL (resolved from env at space-construction
/// time if needed).
pub fn full_replication_single_doc() -> Self {
Self {
arc_policy: ArcPolicy::Full,
loc_fn_policy: LocFnPolicy::HashLoc,
validation_regime: ValidationRegime::SignatureAndParentsOnly,
gossip_initiate_interval_ms: Some(5_000),
fetch_fallback_policy: FetchFallbackPolicy::default(),
iroh_relay_url: None,
}
}

Expand Down Expand Up @@ -128,6 +211,67 @@ mod tests {
);
}

/// Wake-18 D6 — `resolve_iroh_relay` respects both env names with
/// `HOLOGRAPH_IROH_RELAY` winning over `HOLOGRAPH_IROH_RELAY_URL`,
/// and treats whitespace-only strings as unset.
///
/// Uses a mutex against `cargo test`'s default thread pool: env
/// reads are process-global, so two tests poking the same vars
/// concurrently would race. We serialize against a local mutex
/// rather than `--test-threads=1` so the rest of the suite stays
/// parallel.
#[test]
fn resolve_iroh_relay_prefers_short_name() {
// Use a leaked Mutex<()> to serialize env mutations across
// both env tests in this module.
static GUARD: std::sync::Mutex<()> = std::sync::Mutex::new(());
let _g = GUARD.lock().unwrap();

// Snapshot the env so we can restore.
let prev_short = std::env::var("HOLOGRAPH_IROH_RELAY").ok();
let prev_long = std::env::var("HOLOGRAPH_IROH_RELAY_URL").ok();

// Neither set → None.
unsafe {
std::env::remove_var("HOLOGRAPH_IROH_RELAY");
std::env::remove_var("HOLOGRAPH_IROH_RELAY_URL");
}
assert_eq!(resolve_iroh_relay(), None);

// Only long set.
unsafe {
std::env::set_var("HOLOGRAPH_IROH_RELAY_URL", "https://long/relay");
}
assert_eq!(resolve_iroh_relay(), Some("https://long/relay".to_string()));

// Both set → short wins.
unsafe {
std::env::set_var("HOLOGRAPH_IROH_RELAY", "https://short/relay");
}
assert_eq!(
resolve_iroh_relay(),
Some("https://short/relay".to_string())
);

// Whitespace-only → treat as unset, fall through.
unsafe {
std::env::set_var("HOLOGRAPH_IROH_RELAY", " ");
}
assert_eq!(resolve_iroh_relay(), Some("https://long/relay".to_string()));

// Restore.
unsafe {
match prev_short {
Some(v) => std::env::set_var("HOLOGRAPH_IROH_RELAY", v),
None => std::env::remove_var("HOLOGRAPH_IROH_RELAY"),
}
match prev_long {
Some(v) => std::env::set_var("HOLOGRAPH_IROH_RELAY_URL", v),
None => std::env::remove_var("HOLOGRAPH_IROH_RELAY_URL"),
}
}
}

#[test]
fn sharded_policy_round_trips_arc() {
let arc = DhtArc::Arc(100, 200);
Expand Down
Loading