Skip to content

Reliability and tooling hardening (2.5.0)#20

Merged
noxied merged 4 commits into
mainfrom
hardening/p1-followups
Jun 13, 2026
Merged

Reliability and tooling hardening (2.5.0)#20
noxied merged 4 commits into
mainfrom
hardening/p1-followups

Conversation

@noxied

@noxied noxied commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Implements the remaining P1 items from the comprehensive review.

Reliability

  • /healthz returns 503 when the loop is stale (no successful check within a generous multiple of CHECK_INTERVAL), so a wedged loop is detectable; /api/status adds seconds_since_last_check and check_interval.
  • check_ip reworked: notification, DDNS and MQTT failures are logged with stack traces and no longer counted as check failures. Only detection failures drive outage detection and adaptive backoff.
  • The status snapshot is read under a lock (no inconsistent reads from the API thread while the loop mutates state).
  • A failed geo lookup no longer overwrites previously known geo data.

Tooling / types

  • metrics Optional signatures fixed and app.py subsystem attributes annotated, so mypy now runs with strict optional (dropped --no-strict-optional).
  • Added .flake8 (line length 88, Black-compatible) so local and CI flake8 agree.
  • Dependabot now tracks pip dependencies.

Tests

  • 7 new tests: /healthz 200/503/before-first-check, status freshness fields, isolated notifier failure, geo-not-clobbered, concurrent snapshot anchor. Full suite 337 passing, flake8 and strict mypy clean.

No configuration changes required. /healthz returning 503 on a genuinely stuck loop is the one behaviour change (an improvement for API_ENABLED=true users).

noxied added 4 commits June 13, 2026 08:22
- /healthz returns 503 when the loop is stale (no successful check within a
  generous multiple of CHECK_INTERVAL); /api/status exposes
  seconds_since_last_check and check_interval.
- Scope the check exception handling so notification, DDNS and MQTT failures are
  logged with stack traces and no longer counted as check failures; only
  detection failures drive outage detection and adaptive backoff.
- Read the status snapshot under a lock to avoid inconsistent reads from the API
  thread while the loop mutates state.
- Keep previously known geo data when a lookup fails.
- Fix the two implicit-Optional metrics signatures.
Add a .flake8 (line length 88, Black-compatible) so local and CI flake8 agree;
drop --no-strict-optional from mypy now that the code is strict-optional clean;
align pylint line length; add the pip ecosystem to Dependabot.
Cover /healthz staleness (200/503), the status freshness fields, isolated
notifier failures (no false outage or check-failure metric), geo not being
clobbered by a failed lookup, and a concurrent snapshot-vs-check anchor.
Document the /healthz staleness behaviour, the new status fields and the geo
freshness contract; bump to 2.5.0 with changelog and upgrade notes.
@noxied noxied merged commit 117376c into main Jun 13, 2026
15 checks passed
@noxied noxied deleted the hardening/p1-followups branch June 13, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant