Reliability and tooling hardening (2.5.0)#20
Merged
Conversation
- /healthz returns 503 when the loop is stale (no successful check within a generous multiple of CHECK_INTERVAL); /api/status exposes seconds_since_last_check and check_interval. - Scope the check exception handling so notification, DDNS and MQTT failures are logged with stack traces and no longer counted as check failures; only detection failures drive outage detection and adaptive backoff. - Read the status snapshot under a lock to avoid inconsistent reads from the API thread while the loop mutates state. - Keep previously known geo data when a lookup fails. - Fix the two implicit-Optional metrics signatures.
Add a .flake8 (line length 88, Black-compatible) so local and CI flake8 agree; drop --no-strict-optional from mypy now that the code is strict-optional clean; align pylint line length; add the pip ecosystem to Dependabot.
Cover /healthz staleness (200/503), the status freshness fields, isolated notifier failures (no false outage or check-failure metric), geo not being clobbered by a failed lookup, and a concurrent snapshot-vs-check anchor.
Document the /healthz staleness behaviour, the new status fields and the geo freshness contract; bump to 2.5.0 with changelog and upgrade notes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the remaining P1 items from the comprehensive review.
Reliability
/healthzreturns 503 when the loop is stale (no successful check within a generous multiple ofCHECK_INTERVAL), so a wedged loop is detectable;/api/statusaddsseconds_since_last_checkandcheck_interval.check_ipreworked: notification, DDNS and MQTT failures are logged with stack traces and no longer counted as check failures. Only detection failures drive outage detection and adaptive backoff.Tooling / types
metricsOptional signatures fixed andapp.pysubsystem attributes annotated, so mypy now runs with strict optional (dropped--no-strict-optional)..flake8(line length 88, Black-compatible) so local and CI flake8 agree.Tests
No configuration changes required.
/healthzreturning 503 on a genuinely stuck loop is the one behaviour change (an improvement forAPI_ENABLED=trueusers).