Skip to content

feat: soft-start on missing broker positions; add /status endpoint#9

Merged
hardyjosh merged 1 commit into
mainfrom
feat/resilient-boot
May 23, 2026
Merged

feat: soft-start on missing broker positions; add /status endpoint#9
hardyjosh merged 1 commit into
mainfrom
feat/resilient-boot

Conversation

@hardyjosh

@hardyjosh hardyjosh commented May 23, 2026

Copy link
Copy Markdown
Contributor

The boot check used to bail the whole process if any configured symbol
was missing from broker positions. Mid-flight a vanishing position is
gentle (cached price stays, max-staleness eventually rejects requests,
other symbols keep working), but at boot the same condition produced
a total outage — and the "fail loud" rationale chose total outage as
the alert. Bad shape for prod: any future Fly restart after a position
went to 0 brings down every symbol, not just the affected one.

Also blocks new-token rollout: config.toml entries can't merge until
the hedging desk has acquired inventory, even though we can verify
wiring without a price.

Changes:

  • src/main.rs: replace anyhow::bail! with a per-symbol error! log and a
    summary warn!. Server starts in degraded mode with whatever symbols
    have marks; healthy symbols quote normally; missing symbols get the
    existing AppError::Unavailable (503) at request time.
  • src/lib.rs: add /status returning JSON with signer, configured
    symbols, and the currently-missing-from-cache set. /health stays
    lenient ("ok" whenever process is running) so Fly liveness doesn't
    recycle machines on a degraded-but-serving state. AppState now holds
    the configured symbol list so /status can compute the missing set.
  • tests/integration.rs: refactored test_app to test_app_with(addr,
    symbol, price_opt) tuples so tests can build partial-cache states.
    Added coverage for /status (both healthy and degraded), and an end-
    to-end test that a configured-but-uncached symbol returns 503 from
    /context/v1 while other symbols continue to serve.

Closes RAI-657.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

Summary by CodeRabbit

  • New Features
    • Added /status endpoint that displays the current signer address and reports any missing configured symbols
    • Server now operates in degraded mode when configured symbols are unavailable, allowing partial service instead of aborting startup
    • /context/v1 endpoint returns HTTP 503 status when handling requests for unavailable configured symbols

Review Change Stack

The boot check used to bail the whole process if any configured symbol
was missing from broker positions. Mid-flight a vanishing position is
gentle (cached price stays, max-staleness eventually rejects requests,
other symbols keep working), but at boot the same condition produced
a total outage — and the "fail loud" rationale chose total outage as
the alert. Bad shape for prod: any future Fly restart after a position
went to 0 brings down every symbol, not just the affected one.

Also blocks new-token rollout: config.toml entries can't merge until
the hedging desk has acquired inventory, even though we can verify
wiring without a price.

Changes:
- src/main.rs: replace anyhow::bail! with a per-symbol error! log and a
  summary warn!. Server starts in degraded mode with whatever symbols
  have marks; healthy symbols quote normally; missing symbols get the
  existing AppError::Unavailable (503) at request time.
- src/lib.rs: add /status returning JSON with signer, configured
  symbols, and the currently-missing-from-cache set. /health stays
  lenient ("ok" whenever process is running) so Fly liveness doesn't
  recycle machines on a degraded-but-serving state. AppState now holds
  the configured symbol list so /status can compute the missing set.
- tests/integration.rs: refactored test_app to test_app_with(addr,
  symbol, price_opt) tuples so tests can build partial-cache states.
  Added coverage for /status (both healthy and degraded), and an end-
  to-end test that a configured-but-uncached symbol returns 503 from
  /context/v1 while other symbols continue to serve.

Closes RAI-657.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@linear-code

linear-code Bot commented May 23, 2026

Copy link
Copy Markdown

RAI-657

@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The PR implements degraded startup behavior for missing symbols. AppState now carries a configured_symbols list, passed through the constructor. A new GET /status endpoint returns missing symbols computed from the cache. Startup logs warnings instead of aborting when symbols are missing, and tests verify both cached and uncached symbol scenarios.

Changes

Degraded startup with missing symbol reporting

Layer / File(s) Summary
AppState contract with configured symbols
src/lib.rs
AppState struct extended with configured_symbols: Vec<String> field; AppState::new constructor signature updated to accept and store configured symbols.
Status endpoint and response type
src/lib.rs
StatusResponse struct introduced with signer, configured_symbols, and missing_symbols fields. GET /status route wired to a handler that computes missing_symbols from cache and returns the response as JSON.
Startup degradation and state initialization
src/main.rs
Startup changed from fail-fast (bail on missing symbols) to degraded mode (log and continue). Background poll loop clones symbols for the poller, and AppState::new is called with configured symbols as an additional argument.
Test helpers and integration tests
tests/integration.rs
Test infrastructure refactored: test_app_with(entries) helper supports configurable symbol setup with selective cache population. New integration tests verify /status reporting and confirm /context/v1 returns 503 for uncached configured symbols.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • ST0x-Technology/st0x-oracle-server#8: Introduces broker mark polling that populates the quote cache with only fetched symbols, which directly feeds this PR's missing-symbol detection and degraded 503-response behavior.

Suggested reviewers

  • alastairong1

Poem

🐰 A hop through degraded grace,
Where missing symbols find their place,
No bailouts now, just logs that sing,
The status whispers what's missing—
Five-oh-three for partial dreams.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the two main changes: soft-start behavior on missing broker positions and the new /status endpoint.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/resilient-boot

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main.rs`:
- Around line 87-95: Update the CLI/help text that documents
ALPACA_BROKER_ACCOUNT_ID and any related startup messaging to reflect that the
server now starts in a degraded mode instead of failing when symbols are
missing; locate the help text strings referenced near ALPACA_BROKER_ACCOUNT_ID
in main.rs (and any usage in the CLI/parser setup) and change wording to
indicate missing symbols will be served as 503 and the server exposes the
missing set via /status rather than aborting startup; keep references to
degraded/partial-serving behavior and mention monitoring via /status so the docs
align with QuoteCache/partial-serving logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2ffb25d-3c33-4b33-9c7f-d3ab2f35f138

📥 Commits

Reviewing files that changed from the base of the PR and between 8b608e2 and aa36b8b.

📒 Files selected for processing (3)
  • src/lib.rs
  • src/main.rs
  • tests/integration.rs

Comment thread src/main.rs
Comment on lines +87 to 95
// first /context/v1 request doesn't race the poll loop. Missing
// symbols are logged loudly but no longer fatal: the server starts
// in a partial-serving state where healthy symbols quote normally
// and missing symbols return 503 at request time. /status exposes
// the missing set so monitoring can pick up the partial state. We
// chose this over the old hard-bail because the bail took the whole
// oracle down on the next Fly restart whenever any single position
// went to 0 — and the "alert" was the outage itself.
let cache = Arc::new(QuoteCache::new());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update CLI help text to match degraded-startup behavior.

Startup now continues in degraded mode, but the CLI docs for ALPACA_BROKER_ACCOUNT_ID still say startup fails when a symbol is missing (Line 42-Line 43). Please align help text with current behavior.

Suggested diff
-    /// Must be the issuer's account that holds every symbol listed in
-    /// config.toml — startup will fail loud if any registered symbol
-    /// has no current position.
+    /// Should be the issuer's account that backs configured symbols.
+    /// Missing positions do not block startup; affected symbols return
+    /// 503 until inventory appears (see /status).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main.rs` around lines 87 - 95, Update the CLI/help text that documents
ALPACA_BROKER_ACCOUNT_ID and any related startup messaging to reflect that the
server now starts in a degraded mode instead of failing when symbols are
missing; locate the help text strings referenced near ALPACA_BROKER_ACCOUNT_ID
in main.rs (and any usage in the CLI/parser setup) and change wording to
indicate missing symbols will be served as 503 and the server exposes the
missing set via /status rather than aborting startup; keep references to
degraded/partial-serving behavior and mention monitoring via /status so the docs
align with QuoteCache/partial-serving logic.

@hardyjosh hardyjosh merged commit 297a2e8 into main May 23, 2026
5 of 6 checks passed
hardyjosh added a commit that referenced this pull request Jun 1, 2026
Wires CEG, DRAM, TSM and SGOV through the oracle. Each entry maps the
Base wrapper address to the Alpaca ticker; `config.toml` is the
runtime registry the server uses to resolve order tokens to symbols
when serving /context/v1.

Includes the matching `examples/probe_local.rs` `TOKENS` update so the
local smoke test can probe any of the four. Also folds in an
`ORACLE_URL` env override on probe_local — set
`ORACLE_URL=https://st0x-oracle-server.fly.dev/context/v1` to point the
probe at prod, otherwise it falls back to the local server at
`127.0.0.1:3000` as before.

Wrapper addresses match the entries in S01-Issuer registry PRs #21
and #22.

With PR #9's resilience landed, the four new tokens will start in the
"missing broker position" state (503 at /context/v1) until the issuer
omnibus acquires inventory — that's tracked in RAI-729. No deploy
gating needed; symbols come online automatically when positions appear.

Closes RAI-569.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Siddharth2207 pushed a commit to Siddharth2207/st0x-oracle-server that referenced this pull request Jun 2, 2026
Wires CEG, DRAM, TSM and SGOV through the oracle. Each entry maps the
Base wrapper address to the Alpaca ticker; `config.toml` is the
runtime registry the server uses to resolve order tokens to symbols
when serving /context/v1.

Includes the matching `examples/probe_local.rs` `TOKENS` update so the
local smoke test can probe any of the four. Also folds in an
`ORACLE_URL` env override on probe_local — set
`ORACLE_URL=https://st0x-oracle-server.fly.dev/context/v1` to point the
probe at prod, otherwise it falls back to the local server at
`127.0.0.1:3000` as before.

Wrapper addresses match the entries in S01-Issuer registry PRs #21
and #22.

With PR ST0x-Technology#9's resilience landed, the four new tokens will start in the
"missing broker position" state (503 at /context/v1) until the issuer
omnibus acquires inventory — that's tracked in RAI-729. No deploy
gating needed; symbols come online automatically when positions appear.

Closes RAI-569.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant