flock

Author	SHA1	Message	Date
Donavan Fritz	c61b12204c	anycast: drop pods from nexthop set on DeletionTimestamp Build flock Image / build (push) Has been cancelled Details Previously the AnycastReconciler kept a pod in the nexthop set as long as its PodReady condition was True. During a rolling restart that produces a window after kubelet has accepted SIGTERM (DeletionTimestamp set, pod still Ready until probes observe shutdown) where BGP still advertises a path through the dying pod's veth — in-flight requests get RST'd when the container actually exits. Fix: introduce podAnycastEligible(pod) = !DeletionTimestamp && Ready, swap it in at the AnycastReconciler's isReady callback, and fire the ready-change callback when DeletionTimestamp transitions (the informer UpdateFunc previously only fired on Ready transitions). Result: as soon as the apiserver marks a pod for deletion, the reconciler withdraws the local nexthop and BIRD reannounces the route without it. Sibling replicas absorb traffic before the pod's terminationGracePeriod elapses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 22:24:50 -05:00
Donavan Fritz	a7dc7bf1f4	anycast: kernel multipath route + L4 hash for multi-pod-per-node Build flock Image / build (push) Has been cancelled Details Move pure resolver logic out of anycast_linux.go into anycast.go so it's unit-testable on any host. Reshape anycastTarget from a single {hostIface, via} into a sorted list of nexthops; multiple Ready pods on the same node binding the same anycast IP now contribute one nexthop each. installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath) when the target has more than one nexthop. Single-nexthop targets keep the simple via-route shape so 1-pod-per-node keeps rendering identically to today's production form in `ip route show`. flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto) rather than just IPs. Best-effort — runs privileged in production, so it works; falls back to L3 hash on environments where the write fails (only matters for the multi-pod-per-node case anyway). resolveAnycastTargets sorts nexthops by canonical(via) for stable comparison so a quiet reconcile pass doesn't churn the kernel route. 8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop), NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4, family mismatch warns, determinism. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:57:32 -05:00
Donavan Fritz	2082df37e5	anycast: revert to lo + add via=pod-eth0 next-hop on host route Build flock Image / build (push) Has been cancelled Details Reverts the eth0-placement hack from `e1e9544`. The design doc's lo placement is correct. Real fix: the host's anycast /128 (or /32) route now uses the pod's own eth0 unicast IP (same family) as the route's `via` next-hop. The kernel then does NDP/ARP for that eth0 IP — which IS configured on the pod's eth0 — so the pod responds normally with no proxy_ndp / proxy_arp trickery on the anycast IP itself. ip -6 route add <anycast>/128 via <pod-eth0-v6> dev flock<8hex> ip -4 route add <anycast>/32 via <pod-eth0-v4> dev flock<8hex> Validation: an anycast IP whose family the pod doesn't have a unicast for is skipped with a warn (an v4 anycast on an IPv6-only pod cannot be NDP-resolved this way; require dual-stack). Bonus cleanup: ESRCH from RouteDel is treated as success (idempotent). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:02:51 -05:00
Donavan Fritz	89a3502446	M6: anycast — pod lo + Ready-gated /128/32 + BIRD export Build flock Image / build (push) Has been cancelled Details CNI ADD now adds anycast IPs to the pod's lo interface (NOT eth0 — design doc rationale: avoid NDP/ARP DAD conflicts when N replicas share an IP). Allocation persists the anycast list. AnycastReconciler: desired = { ip → flock<8hex> } from committed allocations × pod.Status.PodReady=True diff against advertised, install/remove host /128 (v6) or /32 (v4) re-render bird.conf with the active set Triggers: 2s tick, AfterCommit (per ADD/DEL), Pod informer Ready transitions (PodCache.OnReadyChange callback). The bird template already supported Anycast6/Anycast4 via the export filter — this turn finally drives those slices from runtime. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 07:36:47 -05:00

4 Commits