anycast: revert to lo + add via=pod-eth0 next-hop on host route
Build flock Image / build (push) Has been cancelled

Reverts the eth0-placement hack from e1e9544. The design doc's lo
placement is correct.

Real fix: the host's anycast /128 (or /32) route now uses the pod's own
eth0 unicast IP (same family) as the route's `via` next-hop. The kernel
then does NDP/ARP for that eth0 IP — which IS configured on the pod's
eth0 — so the pod responds normally with no proxy_ndp / proxy_arp
trickery on the anycast IP itself.

  ip -6 route add <anycast>/128 via <pod-eth0-v6> dev flock<8hex>
  ip -4 route add <anycast>/32  via <pod-eth0-v4> dev flock<8hex>

Validation: an anycast IP whose family the pod doesn't have a unicast
for is skipped with a warn (an v4 anycast on an IPv6-only pod cannot be
NDP-resolved this way; require dual-stack).

Bonus cleanup: ESRCH from RouteDel is treated as success (idempotent).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Donavan Fritz
2026-04-25 08:02:51 -05:00
parent e1e9544e2e
commit 2082df37e5
2 changed files with 83 additions and 45 deletions
+14 -13
View File
@@ -241,18 +241,19 @@ func configurePodSide(req SetupRequest) error {
}
}
// Anycast: assign each IP to pod eth0 (NOT lo).
//
// The original design doc proposed lo to avoid NDP/ARP DAD
// conflicts "across nodes advertising the same IP". That concern
// doesn't apply to flock: each pod's veth is its own private /64,
// so DAD on eth0 only sees the veth peer (host) — no cross-node
// L2 contention. Putting the IP on eth0 instead means the pod
// kernel answers NDP solicits arriving on eth0 for that IP, which
// is what the host's /128 host route requires. With anycast on
// lo, NDP from the host side fails and the kernel drops the
// packet between routing decision and transmit.
// Anycast: assign each IP to pod lo, per design doc. NDP/ARP for
// the anycast IP itself never happens because the host route on
// the host side is `<anycast> via <pod-eth0-ip> dev flock<8hex>`.
// The kernel resolves <pod-eth0-ip> via NDP/ARP — and that IP IS
// on eth0, so the pod responds normally.
if len(req.Anycast) > 0 {
lo, err := netlink.LinkByName("lo")
if err != nil {
return fmt.Errorf("lookup pod lo: %w", err)
}
if err := netlink.LinkSetUp(lo); err != nil {
return fmt.Errorf("set up pod lo: %w", err)
}
for _, ip := range req.Anycast {
var mask net.IPMask
if ip.To4() != nil {
@@ -262,8 +263,8 @@ func configurePodSide(req SetupRequest) error {
mask = net.CIDRMask(128, 128)
}
a := &netlink.Addr{IPNet: &net.IPNet{IP: ip, Mask: mask}, Scope: int(netlink.SCOPE_UNIVERSE)}
if err := netlink.AddrAdd(eth0, a); err != nil && !errors.Is(err, os.ErrExist) {
return fmt.Errorf("pod eth0 anycast %s: %w", ip, err)
if err := netlink.AddrAdd(lo, a); err != nil && !errors.Is(err, os.ErrExist) {
return fmt.Errorf("pod lo anycast %s: %w", ip, err)
}
}
}