Describe the bug
MsftLinuxPatchAutoAssess.service is created with Type=forking, but the script runs Python in the foreground and never daemonizes. systemd waits for the fork-and-exit signal, hits the default TimeoutStartSec=90s, and SIGTERMs the cgroup. On any host where Python startup + the first package-manager call exceeds 90s, every auto-assessment is killed before it writes a result and the Portal shows "Pending" forever.
The same unit template is written for all distros - issue is latent everywhere, only reliably reproducible on slow hosts.
To Reproduce
- RHEL 8.4 VM (Arc or Azure) as example
- Make yum slow enough to exceed 90s. Wrap /usr/bin/yum:
# wrap yum so every call sleeps 120 min first
sudo mv /usr/bin/yum /usr/bin/yum.real
sudo tee /usr/bin/yum >/dev/null <<'EOF'
#!/bin/bash
sleep 120
exec /usr/bin/yum.real "$@"
EOF
sudo chmod +x /usr/bin/yum
- Install the extension, enable auto-assessment (assessmentMode=AutomaticByPlatform)
- sudo systemctl start MsftLinuxPatchAutoAssess.service
- journalctl -u MsftLinuxPatchAutoAssess.service
Expected behavior
Assessment runs to completion. .status flips from Transitioning to success.
Actual behavior
SIGTERM at exactly T+90s, MainPID=0, Result: failed (timeout). Status stays Transitioning indefinitely:
Jun 04 15:17:01 lab.opsmgr.net sudo[120526]: root : TTY=pts/1 ; PWD=/opt/lpe-repro/runtime/config ; USER=root ; COMMAND=/bin/systemctl start MsftLi nuxPatchAutoAssess.service
Jun 04 15:17:01 lab.opsmgr.net sudo[120526]: pam_unix(sudo:session): session opened for user root by root(uid=0)
Jun 04 15:17:01 lab.opsmgr.net systemd[1]: Starting Microsoft Azure Linux Patch Extension - Auto Assessment...
Jun 04 15:17:16 lab.opsmgr.net sudo[120562]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/id
Jun 04 15:17:16 lab.opsmgr.net systemd[1]: Started Session c128 of user root.
Jun 04 15:17:16 lab.opsmgr.net sudo[120562]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 15:17:16 lab.opsmgr.net sudo[120562]: pam_unix(sudo:session): session closed for user root
Jun 04 15:17:16 lab.opsmgr.net systemd[1]: session-c128.scope: Succeeded.
Jun 04 15:17:16 lab.opsmgr.net sudo[120566]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/yum -q check-update
Jun 04 15:17:16 lab.opsmgr.net systemd[1]: Started Session c129 of user root.
Jun 04 15:17:16 lab.opsmgr.net sudo[120566]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 15:18:01 lab.opsmgr.net systemd[1]: Started Session 3800 of user root.
Jun 04 15:18:01 lab.opsmgr.net systemd[1]: session-3800.scope: Succeeded.
Jun 04 15:18:31 lab.opsmgr.net systemd[1]: MsftLinuxPatchAutoAssess.service: start operation timed out. Terminating.
Jun 04 15:18:31 lab.opsmgr.net systemd[1]: MsftLinuxPatchAutoAssess.service: Failed with result 'timeout'.
Jun 04 15:18:31 lab.opsmgr.net sudo[120526]: pam_unix(sudo:session): session closed for user root
Jun 04 15:18:31 lab.opsmgr.net systemd[1]: Failed to start Microsoft Azure Linux Patch Extension - Auto Assessment.
Screenshots
N/A.
Desktop (please complete the following information):
• OS: RHEL 8.4 (reproduced); latent on all supported distros, same template is written unconditionally
• Cloud type: Arc-enabled (same code path as Azure)
• systemd: 239+ (default TimeoutStartSec=90s)
• Affected file: ServiceManager.py → create_service_unit_file()
Smartphone (please complete the following information):
N/A.
Additional context
Root cause
Type=forking requires the parent to fork a daemon child and exit before TimeoutStartSec. The script does neither. systemd's contract is violated → cgroup killed.
Proposed fix
In ServiceManager.create_service_unit_file():
• Type=forking → Type=simple (matches what the script actually does)
• Add TimeoutStartSec=0 (no meaningful start-gate for simple)
• Add RuntimeMaxSec=1800 (30-min safety cap against runaway hangs)
Validated on repro box after fix (~10 min execution to complete with success)
Jun 04 23:44:56 lab.opsmgr.net systemd[1]: Started Microsoft Azure Linux Patch Extension - Auto Assessment.
Jun 04 23:45:11 lab.opsmgr.net sudo[3486]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/id
Jun 04 23:45:11 lab.opsmgr.net sudo[3486]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:45:11 lab.opsmgr.net sudo[3486]: pam_unix(sudo:session): session closed for user root
Jun 04 23:45:12 lab.opsmgr.net sudo[3490]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/yum -q check-update
Jun 04 23:45:12 lab.opsmgr.net sudo[3490]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:47:13 lab.opsmgr.net sudo[3564]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/yum -q --security check-update
Jun 04 23:47:14 lab.opsmgr.net sudo[3564]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:49:15 lab.opsmgr.net sudo[3608]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/yum -y install yum-utils
Jun 04 23:49:15 lab.opsmgr.net sudo[3608]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:51:17 lab.opsmgr.net sudo[3669]: root : PWD=/opt/lpe-repro ; USER=root ; ENV=LANG=en_US.UTF8 ; COMMAND=/bin/needs-restarting -r
Jun 04 23:51:17 lab.opsmgr.net sudo[3669]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:51:18 lab.opsmgr.net sudo[3669]: pam_unix(sudo:session): session closed for user root
Jun 04 23:51:18 lab.opsmgr.net sudo[3674]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/yum -y install yum-plugin-ps
Jun 04 23:51:18 lab.opsmgr.net sudo[3674]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:53:20 lab.opsmgr.net sudo[3718]: root : PWD=/opt/lpe-repro ; USER=root ; COMMAND=/bin/yum ps
Jun 04 23:53:20 lab.opsmgr.net sudo[3718]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Bootstrap environment: Prod
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Building bootstrap container configuration...
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Trying to connect IMDS end point. url:http://169.254.169.254/metadata/instance/compute?api-version=2019-06-01.
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Connecting to IMDS endpoint...
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Exception from IMDS connection http request: URLError(ConnectionRefusedError(111, 'Connection refused'),)
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Failed to connect to IMDS end point. [Trial=1].
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Connecting to IMDS endpoint...
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Exception from IMDS connection http request: URLError(ConnectionRefusedError(111, 'Connection refused'),)
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Failed to connect to IMDS end point. [Trial=2].
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Connecting to IMDS endpoint...
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Exception from IMDS connection http request: URLError(ConnectionRefusedError(111, 'Connection refused'),)
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Failed to connect to IMDS end point. [Trial=3].
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Connecting to IMDS endpoint...
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Exception from IMDS connection http request: URLError(ConnectionRefusedError(111, 'Connection refused'),)
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Failed to connect to IMDS end point. [Trial=4].
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Connecting to IMDS endpoint...
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Exception from IMDS connection http request: URLError(ConnectionRefusedError(111, 'Connection refused'),)
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: - Failed to connect to IMDS end point. [Trial=5].
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Failed to connect IMDS end point after 5 retries. This is expected in Arc VMs. VMCloudType is set to Arc.
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: --------------------------------------------------------------------------------------------------------------------------------
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: ,---. ,---. | | ,-.-. |
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: |---|,---,. .,---.,---. |---',---.|--- ,---.|---. | | |,---.,---.,---.,---.,---.,-.-.,---.,---.|---
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: | | .-' | || |---' | ,---|| | | | | | |,---|| |,---|| ||---'| | ||---'| ||
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: ` ''---'`---'` `---' ` `---^`---'`---'` ' ` ' '`---^` '`---^`---|`---'` ' '`---'` '`---'
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: `---'
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: --------------------------------------------------------------------------------------------------------------------------------
Jun 04 23:55:20 lab.opsmgr.net MsftLinuxPatchAutoAssess.sh[3453]: Completed building bootstrap container configuration.
Jun 04 23:55:20 lab.opsmgr.net systemd[1]: MsftLinuxPatchAutoAssess.service: Succeeded.
Status flipped Transitioning to success; AssessmentState.json.lastStartInSecondsSinceEpoch updated correctly.
Describe the bug
MsftLinuxPatchAutoAssess.service is created with Type=forking, but the script runs Python in the foreground and never daemonizes. systemd waits for the fork-and-exit signal, hits the default TimeoutStartSec=90s, and SIGTERMs the cgroup. On any host where Python startup + the first package-manager call exceeds 90s, every auto-assessment is killed before it writes a result and the Portal shows "Pending" forever.
The same unit template is written for all distros - issue is latent everywhere, only reliably reproducible on slow hosts.
To Reproduce
Expected behavior
Assessment runs to completion. .status flips from Transitioning to success.
Actual behavior
SIGTERM at exactly T+90s, MainPID=0, Result: failed (timeout). Status stays Transitioning indefinitely:
Screenshots
N/A.
Desktop (please complete the following information):
• OS: RHEL 8.4 (reproduced); latent on all supported distros, same template is written unconditionally
• Cloud type: Arc-enabled (same code path as Azure)
• systemd: 239+ (default TimeoutStartSec=90s)
• Affected file: ServiceManager.py → create_service_unit_file()
Smartphone (please complete the following information):
N/A.
Additional context
Root cause
Type=forking requires the parent to fork a daemon child and exit before TimeoutStartSec. The script does neither. systemd's contract is violated → cgroup killed.
Proposed fix
In ServiceManager.create_service_unit_file():
• Type=forking → Type=simple (matches what the script actually does)
• Add TimeoutStartSec=0 (no meaningful start-gate for simple)
• Add RuntimeMaxSec=1800 (30-min safety cap against runaway hangs)
Validated on repro box after fix (~10 min execution to complete with success)
Status flipped Transitioning to success; AssessmentState.json.lastStartInSecondsSinceEpoch updated correctly.