When flock.fritzlab.net/addresses provides a v6 or v4, the IP becomes
the pod's primary IP for that family — bound to eth0, default route off
it, on-link host route via setHostRoute, and a per-pod /128 or /32 in
BGP. IPAM no longer allocates a private IP alongside it. The pod ends up
with exactly the operator-supplied addresses on eth0 (plus any extras
beyond the first-of-family, which keep the pre-existing layered
behavior).
This is the fix the original addresses-annotation work missed: bug #1
allocated a private IP next to the public one (so VPN-routed clients
could land on the private path on Plex). Promoting addresses-supplied
IPs into the IPAM-style routing slot keeps the public IP as the only
primary IP visible from outside.
Three pieces:
- annotations.go: reject pods whose addresses/anycast IP family is
disabled (ipv6/ipv4 annotation or NodeConfig default). Both annotation
types rely on the family being enabled for return-path routing.
- handlers.go: peel first v6 + first v4 from Addresses into res.IP6/IP4;
suppress IPAM for those families; skip IPAM call entirely if both
families are addresses-supplied.
- anycast_linux.go: extend renderBird to advertise any IPAM IP that's
outside the node's BGP aggregate as a per-pod /32 or /128. This is
what makes 142.202.202.166 reachable when host004's pod CIDR is
172.25.214.0/24 — the addresses-promoted IP isn't covered by the
aggregate.
Tests: 7 new annotation tests covering the conflict cases (ipv4=false +
addresses-v4, NodeConfig default + addresses-v4, etc.) plus 5 unit tests
for the splitAddressesPrimary helper.
README updated with the addresses-replaces-IPAM behavior, the
addresses-vs-anycast comparison, the conflict rule, and a Plex-style
example.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the AnycastReconciler kept a pod in the nexthop set as long as
its PodReady condition was True. During a rolling restart that produces a
window after kubelet has accepted SIGTERM (DeletionTimestamp set, pod
still Ready until probes observe shutdown) where BGP still advertises a
path through the dying pod's veth — in-flight requests get RST'd when
the container actually exits.
Fix: introduce podAnycastEligible(pod) = !DeletionTimestamp && Ready,
swap it in at the AnycastReconciler's isReady callback, and fire the
ready-change callback when DeletionTimestamp transitions (the informer
UpdateFunc previously only fired on Ready transitions).
Result: as soon as the apiserver marks a pod for deletion, the
reconciler withdraws the local nexthop and BIRD reannounces the route
without it. Sibling replicas absorb traffic before the pod's
terminationGracePeriod elapses.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Move pure resolver logic out of anycast_linux.go into anycast.go so it's
unit-testable on any host. Reshape anycastTarget from a single
{hostIface, via} into a sorted list of nexthops; multiple Ready pods on
the same node binding the same anycast IP now contribute one nexthop
each.
installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath)
when the target has more than one nexthop. Single-nexthop targets keep
the simple via-route shape so 1-pod-per-node keeps rendering identically
to today's production form in `ip route show`.
flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at
startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto)
rather than just IPs. Best-effort — runs privileged in production, so
it works; falls back to L3 hash on environments where the write fails
(only matters for the multi-pod-per-node case anyway).
resolveAnycastTargets sorts nexthops by canonical(via) for stable
comparison so a quiet reconcile pass doesn't churn the kernel route.
8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop),
NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4,
family mismatch warns, determinism.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Reverts the eth0-placement hack from e1e9544. The design doc's lo
placement is correct.
Real fix: the host's anycast /128 (or /32) route now uses the pod's own
eth0 unicast IP (same family) as the route's `via` next-hop. The kernel
then does NDP/ARP for that eth0 IP — which IS configured on the pod's
eth0 — so the pod responds normally with no proxy_ndp / proxy_arp
trickery on the anycast IP itself.
ip -6 route add <anycast>/128 via <pod-eth0-v6> dev flock<8hex>
ip -4 route add <anycast>/32 via <pod-eth0-v4> dev flock<8hex>
Validation: an anycast IP whose family the pod doesn't have a unicast
for is skipped with a warn (an v4 anycast on an IPv6-only pod cannot be
NDP-resolved this way; require dual-stack).
Bonus cleanup: ESRCH from RouteDel is treated as success (idempotent).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
CNI ADD now adds anycast IPs to the pod's lo interface (NOT eth0 — design
doc rationale: avoid NDP/ARP DAD conflicts when N replicas share an IP).
Allocation persists the anycast list.
AnycastReconciler:
desired = { ip → flock<8hex> } from
committed allocations × pod.Status.PodReady=True
diff against advertised, install/remove host /128 (v6) or /32 (v4)
re-render bird.conf with the active set
Triggers: 2s tick, AfterCommit (per ADD/DEL), Pod informer Ready
transitions (PodCache.OnReadyChange callback).
The bird template already supported Anycast6/Anycast4 via the export
filter — this turn finally drives those slices from runtime.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>