refactor(iroh-dns): Replace hickory with a simpledns based DNS resolver#4036
Draft
dignifiedquire wants to merge 68 commits into
Draft
refactor(iroh-dns): Replace hickory with a simpledns based DNS resolver#4036dignifiedquire wants to merge 68 commits into
dignifiedquire wants to merge 68 commits into
Conversation
Contributor
|
I don't remember the exact detalis, but there was something that hickory could do that simple-dns can't. @Frando ? |
4f62d53 to
f6c6c09
Compare
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/4036/docs/iroh/ Last updated: 2026-06-16T20:35:48Z |
Recursive resolvers commonly return CNAME records when a queried name
is an alias. The previous code only looked for the exact queried record
type (A/AAAA/TXT) in the answer section, silently returning empty
results for any name behind a CNAME (common with CDNs, cloud LBs).
This adds two levels of CNAME following:
(a) In-response: resolve_cname_chain() walks CNAME records within a
single response packet to find the canonical name, then collects
records matching either the original or canonical name.
(b) Recursive: send_query_following_cnames() detects when a response
contains only a CNAME with no target records, and issues a new
query for the CNAME target. Limited to 8 hops to prevent loops.
Without EDNS(0), well-behaved DNS servers limit UDP responses to 512 bytes (RFC 1035). Iroh endpoint TXT records with multiple addresses can easily exceed this, forcing a TCP fallback round-trip on every endpoint discovery query. Add an OPT pseudo-record to all outgoing queries advertising 1232-byte UDP payload support (the recommended safe value per RFC 6891 and DNS flag day 2020). This avoids unnecessary TCP fallbacks while staying under common path MTU limits.
Previously, send_query tried nameservers sequentially with a 5-second per-nameserver timeout. Since the outer DNS_TIMEOUT is 3 seconds, only the first nameserver was ever reached. A single UDP packet loss meant immediate failure. Now: - All nameservers are queried in parallel via FuturesUnorderedBounded, with staggered starts (100ms between each) so the preferred nameserver gets a head start. - UDP queries retry once per nameserver (2 attempts total) before giving up, matching hickory-resolver's default attempts:2 behavior. - Per-nameserver timeout reduced to 2s so individual attempts complete quickly and don't block the overall query. - First successful response from any nameserver wins. Also fixes CNAME name-matching to gracefully handle responses without a question section (accept all records of the target type).
Use recv_from instead of recv to verify that the DNS response came from the expected nameserver. Without this check, a local network attacker could race a spoofed response before the real one arrives. The random 16-bit query ID provides some defense, but source address validation is standard defense-in-depth against cache poisoning.
DNS-over-TLS and DNS-over-HTTPS currently derive the TLS server name from the IP address. This works for providers with IP SANs in their certificates (Google, Cloudflare) but will fail for servers with hostname-only certificates. Document this as a known limitation.
TxtRecordData was changed from Box<[Box<[u8]>]> to Box<[String]>, which is lossy for non-UTF-8 TXT record content and breaks the public API. Restore the original bytes representation to preserve binary TXT record fidelity. Display still uses from_utf8_lossy for rendering. Keep From<Vec<String>> for convenience at construction sites.
The previous extract_txt_record_data used TXT::attributes() which returns a HashMap, losing ordering and deduplicating keys. This is destructive for iroh's endpoint records which publish multiple addr= entries as separate TXT records. Replace with String::try_from(txt) which preserves the raw concatenated content of each TXT record faithfully.
The resolv.conf parser previously only extracted nameserver lines, ignoring search and domain directives. This means search domain completion for short hostnames was silently broken. Parse both directives per resolv.conf(5) semantics: search and domain are mutually exclusive, last one wins. Introduce SystemDnsConfig struct to carry both nameservers and search domains. The resolver does not yet apply search domains to queries -- this commit just ensures the configuration is read and available.
Some setups (Docker, VPNs, custom resolvers) use non-standard DNS ports. Previously, entries like "nameserver 8.8.8.8:5353" would silently fail to parse as IpAddr and be skipped. Now try parsing as SocketAddr first (which supports port), falling back to IpAddr with the default port 53.
NXDOMAIN (domain doesn't exist) and SERVFAIL (server error) were lumped into the same ServerError variant. Add a dedicated NxDomain variant so callers can distinguish "this domain doesn't exist" from "DNS is broken" and skip retries for definitive NXDOMAIN responses.
dedup_by_key only removes consecutive duplicates, so if the same DNS server appears on non-adjacent network adapters, it would survive deduplication. Use a HashSet to properly deduplicate regardless of ordering.
Document two known limitations: 1. No negative caching: NXDOMAIN/NODATA responses are never cached, which can cause thundering herd under high concurrency for non-existent domains. This matches the old hickory-resolver config. 2. No TCP/TLS connection reuse: each query opens a fresh connection, adding a full TLS handshake per DoT query. Only affects non-default DoT/DoH configurations.
The field is intentionally parsed from resolv.conf but not yet consumed by the resolver. Add allow(dead_code) with a comment explaining why.
Implement resolv.conf search domain semantics per resolv.conf(5): - Short hostnames (fewer dots than ndots, default 1) try each search domain suffix first, then the bare name. - Names with enough dots try the bare name first, then search domains. - FQDNs (trailing dot) bypass search domains entirely. - NXDOMAIN responses advance to the next candidate name rather than failing immediately. This makes the custom resolver behave like system resolvers for short hostnames, which matters for Docker, Kubernetes, and corporate network setups where search domains are commonly configured.
- Collapse nested if/if-let into if-let chains (resolve_cname_chain) - Use .then() and .ok()? for cname_target - Use is_ok_and for is_truncated - Use matches! for record type checking - Use bytes() instead of chars() for dot counting - Extract with_timeout helper to deduplicate timeout wrapping - Simplify stagger logging in send_query - Use elapsed() instead of manual Instant arithmetic in cache - Remove dead system_nameservers() wrapper - Remove stale #[allow(dead_code)] on search_domains
- with_timeout: use `?` on Elapsed to get DnsError::Timeout directly instead of wrapping in a fake io::Error -> DnsError::Transport - InvalidPacket: use `?` on SimpleDnsError directly (from_sources generates the From impl) - UDP source validation: use `?` on io::Error instead of e!() wrapper - TLS config missing: use io::Error::other for brevity - Remove unused n0_error::e import from transport.rs
Contributor
I think it can do recursive resolution, but that is disabled by default. So I don't think we have to worry about it. |
Move the per-platform system DNS readers into unix, windows and android submodules under dns::system_config, each exposing read_system_dns(). Android no longer forwards to hickory_resolver: its JNI reader, which reads the active network's DNS servers from LinkProperties.getDnsServers(), is inlined and adapted to return SystemDnsConfig. install_android_jni_context moves alongside it and is re-exported at the crate root unchanged. Add jni (android) and ipconfig (windows) as platform dependencies.
Replace the fixed-stagger fan-out in send_query with a bounded happy-eyeballs loop: try the historically fastest nameserver first, start the next after a short delay or as soon as the in-flight one fails, and cap concurrency so a long nameserver list no longer blasts every server. Track a per-nameserver smoothed RTT (EWMA on success, penalty on failure, read-time decay) to order servers and re-probe demoted ones, so the list is self-healing. Expand the fallback to Cloudflare, Google and Quad9 over UDP (v4 and v6, primary and secondary) plus DNS-over-HTTPS when a crypto provider is available, so resolution still works when one provider is down or plain DNS is blocked. Refactor the system config type to DnsConfig with system, fallback, from_nameservers and system_with_fallback constructors.
Return a DnsConfig instead of a 3-tuple and fold the public-resolver fallback into a single trailing check applied in one place.
Order nameservers by smoothed RTT relative to a neutral baseline so a measured-fast server stays ahead of an idle or recovering one, instead of unprobed or decayed servers sorting as fastest. Verify the host and query type on a DNS cache hit, so a 64-bit key collision returns a miss rather than another name's records. Fall back to public resolvers only when neither the system nor the builder provides a nameserver, so explicit servers are no longer mixed behind the fallback set; drop the now-redundant system_with_fallback. Correct the DoH-by-IP certificate comment.
Confirmed via the live certificates that Cloudflare, Google and Quad9 all carry their anycast IPs as iPAddress SANs, so IP-addressed DNS-over-HTTPS validates for every fallback entry.
Carry an optional TLS server name per nameserver internally, so DNS-over-TLS and DNS-over-HTTPS can be configured against providers whose certificates cover a hostname rather than the IP. Add Builder::with_tls_nameserver and Builder::with_https_nameserver; the existing tuple-based with_nameserver/with_nameservers and the DnsProtocol enum are unchanged, so the change is additive. DoT uses the name for SNI; DoH addresses the URL by hostname with the connection pinned to the configured IP via reqwest, avoiding a bootstrap resolution loop.
Narrow items that are only reachable within their own module to private: the internal Nameserver type (and its fields and constructor), DnsConfig::from_nameservers, attrs::endpoint_id_from_txt_name, and TxtAttrs::from_strings.
On a fresh resolver, fallback nameservers are queried in list order, and the DoH entries sat behind all twelve UDP servers. On a network that silently drops UDP/53 the lookup timed out before DoH was ever tried -- the exact case it exists for. Move the DoH entries just after the two fastest UDP primaries so they land within the happy-eyeballs first wave (MAX_CONCURRENT_QUERIES). On a working network UDP still answers before the staggered DoH attempts start, so no TLS handshake is wasted; when port 53 is blocked DoH is raced immediately. A test pins a DoH entry within the first wave.
A major network change rebuilds the resolver via `reset()` to pick up new nameservers. It used to start with an empty cache, so lookups went cold exactly during the transition (e.g. WiFi to 5G) and reconnects stranded while DNS was still in flux. Make `DnsCache` Arc-backed and carry it into the rebuilt resolver, so cached records keep serving until the new nameservers settle. Closes #4037
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.