Replies: 1 comment
-
This is just a brief document before our sync. It will be updated as needed and finalized through our sync. IP TablesAs shown in the previous sync, ip tables can solve the DNS problem in a Docker network as a proof of concept. The idea:
For the Docker case we add one more rule in This works, but as discussed, we need a more dynamic solution. So I keep it as the baseline/fallback only. IPVSIPVS (IP Virtual Server) is an efficient load balancer built on netfilter that sits in front of a cluster of servers. It works in three modes: NAT, Direct Routing, and IP Tunneling, but it's also not our solution, for a simple reason:
So it adds a lot of machinery for nothing here. Not pursued. TCIn our current implementation of the dynamic network we depend on a simple idea: the TC ingress hook is triggered before any IP stack, so we mirror all traffic from the container veth into our tap, and we do the same on the tap ingress. Like this we get some kind of bidirectional flow between the two: Also, this explains why any manipulation of So for the Docker case as a PoC we can depend on only TC and drop all the routing stuff. The point is that The proposal is simple: with the same hole on a specific IP, we route the packets into The tcpdump output, working end to end. As we see here, the packet shows up as out on The last thing: loopback writes the src & dst MAC as zero which makes our guest drop the packets, so in the TC rule on How we could generalizeGiven this small idea, now if we generalize to N taps (N containers) in our main namespace, we'll have N+1 interfaces (the taps +
The catch (return path): while above solutions seems to work with the unique-port rule handles the inbound/listening direction fine, which is a stable key. But for traffic the guest starts (like DNS), the reply comes back to the guest's ephemeral source port (e.g. Two ways out:
TC + eBPFThe static-rule approach above is enough for the single-container PoC or multiple containers with only listening ports. The more powerful option for the general case: instead of static TC rules, use eBPF programs at the hooks (ingress & egress), so we get all the power an eBPF program gives us, at the cost of C code + another dependency on something like cilium/ebpf for loading and attaching the kernel programs. The key thing eBPF buys us is state. A simple eBPF map, accessible from a user program if needed and shared between all the programs, can hold the ports + interfaces info, and (more importantly) we can record at ingress which tap a flow came from and look it up at egress to send the reply back to the right tap. That's exactly the return-path problem static rules can't solve in a shared netns. We can also store IPs, MACs, etc. (even while the programs run), so even NATing (like the Docker DNS case) can be done at this level instead of with iptables. So it's TC with an extra cost and functional gain as the moment we go multi-container and need the guest-initiated return path, the map state is the thing that actually makes it work. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Mentee: @ (Ali Mohamed)
Mentors: @ananos (Tassos), @cmainas (Babis)
Term: June 8 – August 29, 2026 (12 weeks) · Midterm evaluation: end of Week 6 (~July 21) · Final evaluation: end of Week 12 (~August 29)
Sync: Weekly call, Mondays · Ad-hoc sessions as needed · Async via this Discussion + CNCF Slack (#urunc)
Background & motivation
urunc runs the unikernel/microVM in a separate network namespace inside the pod. As a result,
localhostand Docker's embedded DNS resolver (127.0.0.11) are unreachable from the guest. The current dynamic network setup uses a TC-mirroring catch-all between the tap interface (tap0_urunc) and the veth pair, effectively bridging all traffic — seepkg/network/network.go#L187.The same limitation exists in other sandboxed runtimes (e.g. Kata Containers), where only unmerged proposals/workarounds exist. Solving this generically is valuable beyond urunc.
Plan for a working prototype (iptables/ip-rule based) already exists from the proposal phase: hole-punch in the TC mirror for a specific IP, custom routing table below the local table, fwmark in the mangle OUTPUT chain,
route_localnet, and ARP binding on the tap. This serves as the baseline/fallback. Its main downside: hard-coded, pod-subnet static IPs — the same drawback as the static Knative configuration.Goal
A generic, dynamic mechanism (no static/hard-coded IPs) that allows the unikernel to reach host-namespace localhost services, starting with Docker's embedded DNS resolver, and that generalizes to multiple ports and the Kubernetes (multi-container pod) case. Target: code merged in
urunc-dev, incrementally.Plan (by week)
Phase 0 — Onboarding & analysis (Weeks 1–2, Jun 8–21)
Phase 1 — Design & PoC (Weeks 3–5, Jun 22–Jul 12)
tap0_urunc(e.g.172.23.0.250) forcing traffic onto the tap + forwarding rules to the Docker resolver (127.0.0.11); solve the return-path problem without static in-namespace IPsnslookup/app-level test against container names)resolv.conf, but not all libOS/unikernels have the same way of configuring the DNS resolver of the system, so we need to understand what is supported and how to configure it properly (inject file? CLI option etc.).Phase 2 — Midterm milestone (Week 6, Jul 13–19)
🎯 Concrete milestone for the midterm evaluation (end of Week 6, ~Jul 21):
Acceptance criteria:
Phase 3 — Hardening & generalization (Weeks 7–9, Jul 20–Aug 9)
Phase 4 — Merge & wrap-up (Weeks 10–12, Aug 10–29)
urunc-devReferences
pkg/network/network.go#L187Status updates: please post a short weekly comment (done / in progress / blockers) before each Monday sync.
Beta Was this translation helpful? Give feedback.
All reactions