feat: 温度 / 事件告警(webhook · 命令 · 日志)#2
Merged
Merged
Conversation
Alert rules hang off the daemon's existing 1 Hz temperature loop, so the data cost is zero. Three triggers for the first cut, all from data already in hand: temp thresholds, the fan safety guard tripping, and SMC write failures. - AlertEngine (PolicyEngine): pure, edge-triggered state machine per rule — for-duration debounce, cooldown, optional resolve. now injected for tests. - AlertActionRunner (daemon): runs actions OFF the state queue, every exec and webhook bounded by a timeout, a concurrency cap drops rather than piles up, failures are logged not propagated. exec uses argv arrays only (no shell) and also exposes SMCTL_ALERT_* env vars. - config.toml gains [[alert]] tables; writeConfig round-trips them so a battery/ fan write never silently drops the user's alerts (regression-tested). - XPC: getAlertStatus + testAlert; CLI: `smctl alert status` / `alert test`. Security: exec runs as root; the trust boundary is whoever can edit the root-owned config — same as allow_below_minimum. Privacy: webhooks are a second (opt-in) outbound case; README/zh-CN updated to say so. Tests: 12 engine cases (debounce/cooldown/resolve/triggers/state purge), TOML decode + writeConfig preservation, and a real-subprocess exec integration test. Live install/webhook E2E intentionally not run to avoid disrupting the user's installed daemon; unit coverage is the safety net.
Keep daemon status errors separate from alert write-error state so general runtime issues do not fire SMC write failure alerts. Track SMC write failures from battery and fan write paths, clear the signal after a successful write, and cover both false-positive and recovery behavior in daemon tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
做了什么
把告警挂在 daemon 已有的每秒温度监控循环上(护栏就靠它),所以数据采集成本为零。第一版三类触发源,数据全在手里:
temp—— 传感器持续超阈值guard—— 风扇安全护栏触发(被强制恢复 auto)write-error—— SMC 写回读校验失败动作三种:
webhook(HTTP POST)/exec(跑命令)/log。架构要点
AlertEngine(PolicyEngine,纯逻辑边沿状态机):for 持续时长 + cooldown + 边沿触发,now注入可测AlertActionRunner在独立队列跑,绝不阻塞 1Hz 循环;exec/webhook 都有超时;并发闸满了直接丢弃而非堆积;失败只记日志不传播writeConfig序列化[[alert]]—— 否则一次 battery/fan 写入会把用户告警抹掉(已加回归测试)安全 / 隐私
/etc/smctl/config.toml的人,与allow_below_minimum完全一致。命令只走 argv 数组(不经 shell),无注入面。测试
[[alert]]TOML 解码 + writeConfig 保留告警回归/usr/bin/touch验证 argv 占位符替换)未做(诚实说明)
为不干扰用户已安装的生产 daemon,没有跑「装新 daemon + 真实 webhook」的线上 E2E。XPC 管线是机械镜像现有可用方法,加上引擎/配置/exec 的单测,作为安全网。合并后建议在干净环境做一次真机 webhook 验收。
后续
第 3 步(
alert list等增强)与第 4 步(降频转告警、电源策略写入)见路线图。降频数据来自 #1 的power status。