Skip to content

feat(telegram): route operational errors to a dedicated errors channel#269

Merged
spalen0 merged 2 commits into
mainfrom
error-msg
Jun 10, 2026
Merged

feat(telegram): route operational errors to a dedicated errors channel#269
spalen0 merged 2 commits into
mainfrom
error-msg

Conversation

@spalen0

@spalen0 spalen0 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Problem

Operational errors are spamming the main per-protocol alert groups. Example reported on the aave topic:

GraphQL error in response: [{'message': 'bad indexers: {0x090f…: BadResponse(400), …}'}]

This came from an inline error send in protocols/aave/proposals.py that's caught and reported without crashing, so the run_with_alert crash wrapper never handled it — it went straight to the aave group.

What this does

Adds a dedicated errors channel so transient failures (GraphQL/fetch errors, retries, script crashes) route to one chat instead of the main groups.

Mechanism (utils/telegram.py)

  • New send_error_message(message, protocol) + ERROR_CHANNEL = "errors" key, reusing the existing channel-routing scheme — configure entirely via env vars.
  • Each message is prefixed with a [protocol] label so the merged feed shows which monitor produced it.
  • Safe fallback: if no errors destination is configured, it routes to the protocol's own channel exactly as before. Backward-compatible — nothing changes until the env var is set.
  • Always plain-text + silent by default.

Configuration (see .env.example)

# Pick ONE (topic takes precedence)
TELEGRAM_TOPIC_ID_ERRORS=321          # forum-style topics group
TELEGRAM_CHAT_ID_ERRORS=-1001234567   # or a standalone chat
TELEGRAM_BOT_TOKEN_ERRORS=...         # optional, falls back to DEFAULT bot

Call sites rerouted (16 files)

  • All unhandled crashesutils/runner.py (covers every script via run_with_alert).
  • Reported caseaave/proposals.py (GraphQL error + Graph-API-retry failure).
  • Inline operational errors — fluid, maker, lido, strata, yearn (Envio), timelock, morpho (fetch/GraphQL failures), and the send_alert(LOW, …failed) diagnostics in ethena, usdai, apyusd, infinifi, maple, rtoken.

Left in main groups (real protocol signals, not noise)

morpho 🚨 No vaults data found, ustb CRITICAL zero-price, ethena MEDIUM stale-data alerts.

Bonus cleanup

The timelock Envio-unreachable handler was fanning the error out to every monitored protocol's group (a big spam source) and referenced an undefined protocol on the GraphQL-error path (latent NameError). Both now collapse to a single labelled errors-channel send.

Testing

  • ruff check ✅ / ruff format (no changes) ✅
  • py_compile on all changed files ✅
  • mypy — no errors in the changed files (pre-existing repo errors are unrelated)

🤖 Generated with Claude Code

codex and others added 2 commits June 10, 2026 07:22
Add send_error_message() + an ERROR_CHANNEL key so transient failures
(GraphQL/fetch errors, retries, script crashes) go to a dedicated chat
instead of spamming the per-protocol alert groups. Each message is
prefixed with a [protocol] label, and falls back to the protocol's own
channel when no errors destination is configured so nothing is lost.

Reroute the runner crash-alert and the inline operational-error sends
across the fleet (aave, fluid, maker, lido, strata, yearn, timelock,
morpho, ethena, usdai, apyusd, infinifi, maple, rtoken). Real protocol
signals (e.g. morpho "No vaults data", ustb zero-price) stay in their
main groups.

Also collapse the timelock Envio-unreachable handler, which fanned the
error out to every protocol group and referenced an undefined `protocol`
on the GraphQL-error path, into a single labelled errors-channel send.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Repoint the runner crash-alert tests and the fluid fetch-error test at
send_error_message (the call site they mock changed), and drop the
plain_text/disable_notification assertions that are now internal to
send_error_message. Add dedicated send_error_message tests covering the
labelled errors-chat route, topic precedence, and the fallback to the
protocol's own channel when no errors destination is configured.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@spalen0 spalen0 marked this pull request as ready for review June 10, 2026 09:33
@spalen0 spalen0 merged commit da822f0 into main Jun 10, 2026
2 checks passed
@spalen0 spalen0 deleted the error-msg branch June 10, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant