Skip to content

monitor: Configure systemd restart limits to better handle transient …#143

Open
mukilan wants to merge 1 commit into
mainfrom
fix-intermittent-gh-issues
Open

monitor: Configure systemd restart limits to better handle transient …#143
mukilan wants to merge 1 commit into
mainfrom
fix-intermittent-gh-issues

Conversation

@mukilan

@mukilan mukilan commented Jun 25, 2026

Copy link
Copy Markdown
Member

…GH issues.

This patch changes the default restart limits so that the monitor service doesn't get stuck in a failed state by restarting too quickly and exhausting systemd's default rate limits. The new configuration will attempt to restart the service with an exponential backoff rate of 2.27 and a maximum of 5 minutes i.e. the restarts will be attempted in the following sequence 5s, 11.3s, 26s, 58.5s, 2m 21s, 5m, 5m ...

We also set a maximum of 10 restarts in 35 minutes so we don't restart indefinetly. This can be relaxed later if we find that it still doesn't help with recovery from intermittent GH issues.

Potentially fixes: #112
Testing: Not tested. The code is based on my understanding of the systemd docs.

…GH issues.

This patch changes the default restart limits so that the monitor
service doesn't get stuck in a failed state by restarting too quickly
and exhausting systemd's default rate limits. The new configuration
will attempt to restart the service with an [exponential backoff] rate
of 2.27 and a maximum of 5 minutes i.e. the restarts will be attempted
in the following sequence 5s, 11.3s, 26s, 58.5s, 5m, 5m ...

We also set a maximum of 10 restarts in 35 minutes so we don't restart
indefinetly. This can be relaxed later if we find that it still doesn't
help with recovery from intermittent GH issues.

Potentially fixes: #112
Testing: Not tested. The code is based on my understanding of the
systemd docs.

[exponential backoff]: https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html?#RestartSteps=

Signed-off-by: Mukilan Thiyagarajan <mukilan@igalia.com>
@mukilan mukilan requested a review from delan as a code owner June 25, 2026 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The monitor service doesn't always recover from failures automatically.

1 participant