feat(signalk): pin issuing CA via TOFU instead of leaf certificate#1028
Conversation
Adversarial code review (3 fresh-context reviewers) + fixesRan an independent multi-persona review of the implementation. It caught a critical bug I'd introduced; all three code bugs are fixed in this branch. Fixed
Accepted / follow-up (not fixed here)
Verification statusCompiles on esp32c3+SSL; pure decision unit tests pass (6/6). The callback path (incl. the fixed depth-0 handling and the cross-leg stash) is not on-device verified — mbedTLS isn't in the native build. The reviewer-recommended on-device case (stored fingerprint + mismatched self-signed CA:TRUE leaf → reject) should be run on hardware. |
d458a07 to
b29fda1
Compare
|
Added leaf-identity (SAN) binding in CA-anchor mode, folded into the feature commit. The earlier design pinned the CA but kept the hostname check off, which is only safe for a single-purpose private CA. As discussed, that's a real hole — and a regression — for a public CA: pinning the Let's Encrypt intermediate plus no identity check would accept any LE-signed leaf (any Now: capture the leaf's DNS SAN set with the CA and require a reconnecting leaf to present that same identity and chain to the pinned CA. A leaf for a different name from the same CA is rejected. A CA is adopted only when the leaf has a bindable identity (≥1 SAN); otherwise it falls back to leaf-fingerprint mode. Compiles on esp32c3+SSL; native decision tests 8/8. Still needs on-device verification of the runtime SAN comparison. |
|
On-device Pass 1 (migration + leaf-mode) verified on a HALSER (ESP32-C3) against a leaf-only Signal K server:
So the migration is non-breaking and the leaf-mode path, the cross-task pending/commit, and the new state machine all run on real hardware. Also rebased onto current main (no TOFU files touched by the 2 intervening commits). Still to verify on-device (needs the server to present its CA, i.e. halos-core-containers#197 deployed): CA adoption, SAN identity binding, the verified upgrade from a leaf pin, and leaf-rotation survival. |
|
Pass 2 verified on-device (HALSER + halosdev with the CA chain served, i.e. halos-core-containers#197 deployed):
All restored to a healthy state afterward. |
TOFU pinned the SHA-256 of the leaf certificate, so every certificate rotation broke the pinned connection and surfaced only as a generic disconnect. Servers behind a stable private CA (e.g. HaLOS, which re-signs the leaf with a fresh key on every renewal and hostname change) rotate the leaf routinely, so the pin broke constantly. Pin the issuing CA instead, with two internal modes chosen by what the server presents: - CA-anchor mode: the captured CA is installed as the mbedTLS trust anchor and the leaf is validated against it (REQUIRED-mode chain validation), so the pin survives leaf rotation. - leaf-fingerprint mode: exact SHA-256 leaf match (previous behavior), retained for servers that present no CA. The anchor is captured by cryptographic role (basicConstraints CA:TRUE), not by position. A candidate is stashed during the handshake and persisted only after the connection succeeds, so an unauthenticated MITM handshake cannot plant an anchor. The mode is fixed at first capture: a device already pinned to a leaf stays in leaf-fingerprint mode and does not adopt a CA from a later reconnect. Moving an existing leaf pin to CA pinning requires a manual reset-tofu followed by a fresh first-use capture. Bind identity in CA-anchor mode: the leaf's DNS SAN set is captured with the CA, and a reconnecting leaf must present the same identity AND chain to the pinned CA. This keeps CA pinning safe even against a public CA such as Let's Encrypt -- a valid leaf for a different name signed by the same CA is rejected. A CA is adopted only when the leaf has a bindable identity; otherwise the device falls back to leaf-fingerprint pinning. Certificate verification failures set a distinct kSKWSCertificateError state, mapped centrally in set_connection_state so all four TLS paths are covered, so the Status page shows "Certificate verification failed" rather than a plain disconnect. The pinned certificate's CN and role are exposed read-only for auditability. Adds a host `native` env and pure unit tests for the capture decision. Refs #1027 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HE2LwSbPBfV7REXDMMiLwu
Regenerate src/sensesp/net/web/autogen/frontend_files.h from the built web UI so the embedded bundle matches the SSL/TLS settings panel update (pinned- certificate identity / CN display) in the CA-pinning change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01HE2LwSbPBfV7REXDMMiLwu
Summary
Pins the Signal K server's issuing CA via TOFU instead of the leaf certificate, with leaf-identity (SAN) binding so it stays secure even under a public CA. Closes #1027.
Previously TOFU pinned
SHA256(leaf); any rotation broke the connection and surfaced only as a generic "disconnected". Servers behind a stable private CA (e.g. HaLOS, which re-signs the leaf with a fresh key on every renewal and hostname change) rotate the leaf routinely.Security invariant
A connection is accepted iff the presented leaf (a) chains — valid signature and validity dates — to the pinned trust anchor, and (b) presents the same DNS SAN identity captured when the anchor was pinned. Identity is compared against the captured value, not the address the device dialed, so connect-by-IP works while still binding identity. This keeps CA-pinning safe even with a public CA (Let's Encrypt): a valid leaf for a different name signed by the same CA fails (b). Leaf-fingerprint mode binds identity inherently (exact cert match).
What changed
mbedtls_ssl_conf_ca_chain, leaf validated against it under esp-tlsVERIFY_REQUIRED) and leaf-fingerprint mode (exactSHA256(leaf), previous behavior, for servers presenting no CA).basicConstraints CA:TRUE), not by position. A CA is adopted only if the leaf also has a bindable identity (≥1 DNS SAN); otherwise it falls back to leaf-fingerprint mode.connect_ws()clears the stash so only the WS leg populates it.reset-tofufollowed by a fresh first-use capture. Pairs with feat(certs): serve leaf + device CA chain from Traefik for client CA pinning halos-org/halos-core-containers#197.kSKWSCertificateError), mapped centrally inset_connection_stateso all four TLS paths are covered; Status page shows "Certificate verification failed".Verification
pioarduino_esp32c3with SSL;examples/ssl_connection.cppcompiles + links onpioarduino_esp32(flash 82.2%).nativeenv unit tests pass 7/7 — including that a mismatched leaf is rejected even with a CA offered, that a CA with no bindable leaf identity falls back to leaf-pinning, and that a leaf-pinned device stays in leaf mode when a CA is later presented (no auto-upgrade).Connected, deltas flowing (TLS chain validation + SAN binding holding live). The Signal K → SSL/TLS panel renders the rebuilt embedded bundle's new Pinned Certificate field:HaLOS Device CA (halosdev.local) (CA)(CN + captured identity + role). Device serves the new hashed asset (HTTP 200).esp_http_clientpaths are out of CI scope (mbedTLS isn't in the native build) and rely on manual hardware testing.Follow-ups / decisions not verified on hardware
reset-tofu/ config-PUT and the anchor: both can clear/set the trust anchor and ride the device's general HTTP auth — protected when an access password is set, reachable when auth is off (the default, which exposes the whole API regardless, not anything TOFU-specific). CA pinning raises the impact of an unauthorized anchor write (a planted CA vs one leaf), but it's not a new hole and the auth model is the mitigation. Non-gating follow-up: Consider gating TOFU trust-anchor writes (reset-tofu, config PUT) when device auth is off #1031.basicConstraints CA:TRUEread viacrt->MBEDTLS_PRIVATE(ca_istrue)(no public getter in mbedTLS 3.x); SAN viasubject_alt_names+mbedtls_x509_parse_subject_alt_name(public).reset-tofu/re-pin — accepted as a rare, intentional event.🤖 Generated with Claude Code