Commit Graph

3 Commits

Author SHA1 Message Date
Donavan Fritz a7dc7bf1f4 anycast: kernel multipath route + L4 hash for multi-pod-per-node
Build flock Image / build (push) Has been cancelled
Move pure resolver logic out of anycast_linux.go into anycast.go so it's
unit-testable on any host. Reshape anycastTarget from a single
{hostIface, via} into a sorted list of nexthops; multiple Ready pods on
the same node binding the same anycast IP now contribute one nexthop
each.

installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath)
when the target has more than one nexthop. Single-nexthop targets keep
the simple via-route shape so 1-pod-per-node keeps rendering identically
to today's production form in `ip route show`.

flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at
startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto)
rather than just IPs. Best-effort — runs privileged in production, so
it works; falls back to L3 hash on environments where the write fails
(only matters for the multi-pod-per-node case anyway).

resolveAnycastTargets sorts nexthops by canonical(via) for stable
comparison so a quiet reconcile pass doesn't churn the kernel route.

8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop),
NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4,
family mismatch warns, determinism.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:57:32 -05:00
Donavan Fritz 2082df37e5 anycast: revert to lo + add via=pod-eth0 next-hop on host route
Build flock Image / build (push) Has been cancelled
Reverts the eth0-placement hack from e1e9544. The design doc's lo
placement is correct.

Real fix: the host's anycast /128 (or /32) route now uses the pod's own
eth0 unicast IP (same family) as the route's `via` next-hop. The kernel
then does NDP/ARP for that eth0 IP — which IS configured on the pod's
eth0 — so the pod responds normally with no proxy_ndp / proxy_arp
trickery on the anycast IP itself.

  ip -6 route add <anycast>/128 via <pod-eth0-v6> dev flock<8hex>
  ip -4 route add <anycast>/32  via <pod-eth0-v4> dev flock<8hex>

Validation: an anycast IP whose family the pod doesn't have a unicast
for is skipped with a warn (an v4 anycast on an IPv6-only pod cannot be
NDP-resolved this way; require dual-stack).

Bonus cleanup: ESRCH from RouteDel is treated as success (idempotent).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 08:02:51 -05:00
Donavan Fritz 89a3502446 M6: anycast — pod lo + Ready-gated /128/32 + BIRD export
Build flock Image / build (push) Has been cancelled
CNI ADD now adds anycast IPs to the pod's lo interface (NOT eth0 — design
doc rationale: avoid NDP/ARP DAD conflicts when N replicas share an IP).
Allocation persists the anycast list.

AnycastReconciler:
  desired = { ip → flock<8hex> } from
            committed allocations × pod.Status.PodReady=True
  diff against advertised, install/remove host /128 (v6) or /32 (v4)
  re-render bird.conf with the active set

Triggers: 2s tick, AfterCommit (per ADD/DEL), Pod informer Ready
transitions (PodCache.OnReadyChange callback).

The bird template already supported Anycast6/Anycast4 via the export
filter — this turn finally drives those slices from runtime.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 07:36:47 -05:00