Skip to content

feat(signalk): pin issuing CA via TOFU instead of leaf certificate#1028

Merged
mairas merged 2 commits into
mainfrom
feat/tofu-ca-pin
Jun 22, 2026
Merged

feat(signalk): pin issuing CA via TOFU instead of leaf certificate#1028
mairas merged 2 commits into
mainfrom
feat/tofu-ca-pin

Conversation

@mairas

@mairas mairas commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Pins the Signal K server's issuing CA via TOFU instead of the leaf certificate, with leaf-identity (SAN) binding so it stays secure even under a public CA. Closes #1027.

Previously TOFU pinned SHA256(leaf); any rotation broke the connection and surfaced only as a generic "disconnected". Servers behind a stable private CA (e.g. HaLOS, which re-signs the leaf with a fresh key on every renewal and hostname change) rotate the leaf routinely.

Security invariant

A connection is accepted iff the presented leaf (a) chains — valid signature and validity dates — to the pinned trust anchor, and (b) presents the same DNS SAN identity captured when the anchor was pinned. Identity is compared against the captured value, not the address the device dialed, so connect-by-IP works while still binding identity. This keeps CA-pinning safe even with a public CA (Let's Encrypt): a valid leaf for a different name signed by the same CA fails (b). Leaf-fingerprint mode binds identity inherently (exact cert match).

What changed

  • Two internal pin modes, chosen by what the server presents: CA-anchor mode (captured CA installed as the mbedTLS trust anchor via mbedtls_ssl_conf_ca_chain, leaf validated against it under esp-tls VERIFY_REQUIRED) and leaf-fingerprint mode (exact SHA256(leaf), previous behavior, for servers presenting no CA).
  • Capture by cryptographic role (basicConstraints CA:TRUE), not by position. A CA is adopted only if the leaf also has a bindable identity (≥1 DNS SAN); otherwise it falls back to leaf-fingerprint mode.
  • Identity (SAN) binding: the leaf's normalized DNS SAN set is captured alongside the CA; a reconnecting leaf must present that same identity and chain to the pinned CA. This is what closes the public-CA hole.
  • Stash-then-commit: the candidate anchor is stashed during the handshake and persisted only after the connection succeeds, so an unauthenticated MITM handshake can't plant one. connect_ws() clears the stash so only the WS leg populates it.
  • Mode fixed at first capture (no auto-upgrade): a device already pinned to a leaf stays in leaf-fingerprint mode and never adopts a CA from a later reconnect; moving an existing leaf pin to CA-anchor mode requires a manual reset-tofu followed by a fresh first-use capture. Pairs with feat(certs): serve leaf + device CA chain from Traefik for client CA pinning halos-org/halos-core-containers#197.
  • Distinct cert-error status (kSKWSCertificateError), mapped centrally in set_connection_state so all four TLS paths are covered; Status page shows "Certificate verification failed".
  • The pinned cert's CN + role are exposed read-only (frontend source + embedded bundle rebuilt).

Verification

  • ✅ Compiles on pioarduino_esp32c3 with SSL; examples/ssl_connection.cpp compiles + links on pioarduino_esp32 (flash 82.2%).
  • ✅ Host native env unit tests pass 7/7 — including that a mismatched leaf is rejected even with a CA offered, that a CA with no bindable leaf identity falls back to leaf-pinning, and that a leaf-pinned device stays in leaf mode when a CA is later presented (no auto-upgrade).
  • On-device (HALSER ↔ halosdev.local): firmware built against this branch boots, connects, and runs in CA-anchor mode — SK status Connected, deltas flowing (TLS chain validation + SAN binding holding live). The Signal K → SSL/TLS panel renders the rebuilt embedded bundle's new Pinned Certificate field: HaLOS Device CA (halosdev.local) (CA) (CN + captured identity + role). Device serves the new hashed asset (HTTP 200).
  • ⚠️ Remaining cert-chain scenarios not re-run in this PR's CI: leaf-rotation survival, same-CA SAN-mismatch rejection, and cert-error propagation across the esp_http_client paths are out of CI scope (mbedTLS isn't in the native build) and rely on manual hardware testing.

Follow-ups / decisions not verified on hardware

  • reset-tofu / config-PUT and the anchor: both can clear/set the trust anchor and ride the device's general HTTP auth — protected when an access password is set, reachable when auth is off (the default, which exposes the whole API regardless, not anything TOFU-specific). CA pinning raises the impact of an unauthorized anchor write (a planted CA vs one leaf), but it's not a new hole and the auth model is the mitigation. Non-gating follow-up: Consider gating TOFU trust-anchor writes (reset-tofu, config PUT) when device auth is off #1031.
  • basicConstraints CA:TRUE read via crt->MBEDTLS_PRIVATE(ca_istrue) (no public getter in mbedTLS 3.x); SAN via subject_alt_names + mbedtls_x509_parse_subject_alt_name (public).
  • Exact-match SAN binding means a HaLOS hostname reconfiguration (which changes the SAN set) requires a deliberate reset-tofu/re-pin — accepted as a rare, intentional event.

🤖 Generated with Claude Code

@mairas

mairas commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Adversarial code review (3 fresh-context reviewers) + fixes

Ran an independent multi-persona review of the implementation. It caught a critical bug I'd introduced; all three code bugs are fixed in this branch.

Fixed

  • CRITICAL — self-signed CA:TRUE leaf bypassed the fingerprint check. At depth 0, a leaf with basicConstraints CA:TRUE took the CA-stash branch and returned before the fingerprint decision, so in leaf-fingerprint mode a mismatched self-signed CA:TRUE cert was accepted and re-pinned (MITM). Fix: depth 0 always runs the fingerprint/role decision; only depth > 0 CA:TRUE certs are adopted as anchors. A single presented certificate is leaf-pinned, never adopted as a CA.
  • MAJOR — cross-leg stash. The esp_http_client access-request legs (VERIFY_OPTIONAL) could stash a CA that was then committed in on_connected, so the committed anchor could come from a leg that never proved leaf-key possession. Fix: connect_ws() clears the pending stash at start, so only the WebSocket handshake populates the committed anchor.
  • MAJOR — silent pin bypass on PEM overflow. cert_to_pem used a 2560-byte buffer; a larger CA produced an empty PEM that would commit as an empty (pin-disabling) anchor. Fix: buffer enlarged to 4096 and an empty PEM is skipped (fails safe to leaf mode), never stored.
  • MINOR — corrupt stored CA. Parse failure downgraded to VERIFY_OPTIONAL (misleading "capture mode" comment) while still rejecting. Fix: fail closed — keep VERIFY_REQUIRED, surface a cert error, require manual reset.

Accepted / follow-up (not fixed here)

  • CA-mode disables the hostname check (skip_cert_common_name_check), so the pin trusts any leaf the CA ever signs. This is the documented single-purpose-private-CA assumption (correct for HaLOS's per-device CA); it widens trust vs leaf-fingerprint mode and would need SAN binding for a multi-service CA.
  • Config PUT can write the anchor (from_json applies tofu_ca_pem/tofu_fingerprint), bypassing capture-after-handshake. Shares a root cause with the unauthenticated reset-tofu endpoint (auth_required_=false default). Both belong to the same follow-up: gate the config/command API and treat anchor fields as non-user-writable.
  • Captured CA not cryptographically tied to the leaf at capture time (a misconfigured server presenting an unrelated bystander CA would pin it, then fail closed on the next connect). Noted as a limitation; single-CA deployments are unaffected.

Verification status

Compiles on esp32c3+SSL; pure decision unit tests pass (6/6). The callback path (incl. the fixed depth-0 handling and the cross-leg stash) is not on-device verified — mbedTLS isn't in the native build. The reviewer-recommended on-device case (stored fingerprint + mismatched self-signed CA:TRUE leaf → reject) should be run on hardware.

@mairas mairas force-pushed the feat/tofu-ca-pin branch 2 times, most recently from d458a07 to b29fda1 Compare June 22, 2026 10:17
@mairas

mairas commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Added leaf-identity (SAN) binding in CA-anchor mode, folded into the feature commit.

The earlier design pinned the CA but kept the hostname check off, which is only safe for a single-purpose private CA. As discussed, that's a real hole — and a regression — for a public CA: pinning the Let's Encrypt intermediate plus no identity check would accept any LE-signed leaf (any attacker.com cert) as a valid MITM. SensESP must support non-Halos environments, so this isn't a documentation caveat.

Now: capture the leaf's DNS SAN set with the CA and require a reconnecting leaf to present that same identity and chain to the pinned CA. A leaf for a different name from the same CA is rejected. A CA is adopted only when the leaf has a bindable identity (≥1 SAN); otherwise it falls back to leaf-fingerprint mode. Compiles on esp32c3+SSL; native decision tests 8/8. Still needs on-device verification of the runtime SAN comparison.

@mairas

mairas commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

On-device Pass 1 (migration + leaf-mode) verified on a HALSER (ESP32-C3) against a leaf-only Signal K server:

  • Built the wind-interface firmware against this branch and flashed app-only (SPIFFS/config preserved), so the device kept its existing stored leaf fingerprint from the old leaf-pinning code.
  • On boot, from_json loaded that legacy fingerprint → leaf-fingerprint mode → reconnected over wss → leaf matched → SK connection status = Connected, on_connected fired, no cert error, no crash/reboot.

So the migration is non-breaking and the leaf-mode path, the cross-task pending/commit, and the new state machine all run on real hardware. Also rebased onto current main (no TOFU files touched by the 2 intervening commits).

Still to verify on-device (needs the server to present its CA, i.e. halos-core-containers#197 deployed): CA adoption, SAN identity binding, the verified upgrade from a leaf pin, and leaf-rotation survival.

@mairas mairas force-pushed the feat/tofu-ca-pin branch from b29fda1 to 854fb71 Compare June 22, 2026 12:43
@mairas

mairas commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Pass 2 verified on-device (HALSER + halosdev with the CA chain served, i.e. halos-core-containers#197 deployed):

  • Verified upgrade: with the device holding its legacy leaf pin and halosdev now serving leaf+CA, the device logged trusted leaf matched, upgrading pin to issuing CA and committed it. Config after: tofu_pin_is_ca=true, tofu_san='halosdev.hal,halosdev.local', tofu_pin_cn='HaLOS Device CA (halosdev.local)', CA PEM stored, leaf fingerprint cleared.
  • CA-anchor validation: subsequent connects log pinned CA installed as trust anchorCertificate verified → connected.
  • Leaf rotation survival: forced halosdev to re-sign fresh leaves (fingerprint changed across rotations, same CA + SAN); the device stayed connected by validating each new leaf against the pinned CA — the exact case that broke the old leaf-pin.
  • SAN-mismatch rejection (the security property): installed a leaf signed by the same CA but with SAN evil.local (consistent cert+key, valid EKU). The device logged TOFU: leaf identity (SAN) mismatch, rejecting (-0x3000 = X509_CERT_VERIFY_FAILED) and refused to connect — a valid same-CA leaf for a different identity is rejected, closing the public-CA hole.
  • Cert-error state: during that rejection, SK connection status read Certificate verification failed (the distinct kSKWSCertificateError), not a generic disconnect — confirming the cross-task flag propagation the review flagged.

All restored to a healthy state afterward.

mairas and others added 2 commits June 22, 2026 18:54
TOFU pinned the SHA-256 of the leaf certificate, so every certificate
rotation broke the pinned connection and surfaced only as a generic
disconnect. Servers behind a stable private CA (e.g. HaLOS, which re-signs
the leaf with a fresh key on every renewal and hostname change) rotate the
leaf routinely, so the pin broke constantly.

Pin the issuing CA instead, with two internal modes chosen by what the
server presents:
- CA-anchor mode: the captured CA is installed as the mbedTLS trust anchor
  and the leaf is validated against it (REQUIRED-mode chain validation), so
  the pin survives leaf rotation.
- leaf-fingerprint mode: exact SHA-256 leaf match (previous behavior),
  retained for servers that present no CA.

The anchor is captured by cryptographic role (basicConstraints CA:TRUE),
not by position. A candidate is stashed during the handshake and persisted
only after the connection succeeds, so an unauthenticated MITM handshake
cannot plant an anchor. The mode is fixed at first capture: a device already
pinned to a leaf stays in leaf-fingerprint mode and does not adopt a CA from
a later reconnect. Moving an existing leaf pin to CA pinning requires a
manual reset-tofu followed by a fresh first-use capture.

Bind identity in CA-anchor mode: the leaf's DNS SAN set is captured with the
CA, and a reconnecting leaf must present the same identity AND chain to the
pinned CA. This keeps CA pinning safe even against a public CA such as
Let's Encrypt -- a valid leaf for a different name signed by the same CA is
rejected. A CA is adopted only when the leaf has a bindable identity;
otherwise the device falls back to leaf-fingerprint pinning.

Certificate verification failures set a distinct kSKWSCertificateError state,
mapped centrally in set_connection_state so all four TLS paths are covered, so
the Status page shows "Certificate verification failed" rather than a plain
disconnect. The pinned certificate's CN and role are exposed read-only for
auditability.

Adds a host `native` env and pure unit tests for the capture decision.

Refs #1027

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HE2LwSbPBfV7REXDMMiLwu
Regenerate src/sensesp/net/web/autogen/frontend_files.h from the built web
UI so the embedded bundle matches the SSL/TLS settings panel update (pinned-
certificate identity / CN display) in the CA-pinning change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01HE2LwSbPBfV7REXDMMiLwu
@mairas mairas force-pushed the feat/tofu-ca-pin branch from ebf8e58 to 0180f59 Compare June 22, 2026 15:54
@mairas mairas merged commit d3bdc21 into main Jun 22, 2026
8 checks passed
@mairas mairas deleted the feat/tofu-ca-pin branch June 22, 2026 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pin the Signal K server's issuing CA (TOFU) instead of the leaf certificate

1 participant