Commit Graph

5 Commits

Author SHA1 Message Date
Donavan Fritz a17d33e182 agent: addresses annotation replaces IPAM allocation
Build flock Image / build (push) Successful in 5m27s
When flock.fritzlab.net/addresses provides a v6 or v4, the IP becomes
the pod's primary IP for that family — bound to eth0, default route off
it, on-link host route via setHostRoute, and a per-pod /128 or /32 in
BGP. IPAM no longer allocates a private IP alongside it. The pod ends up
with exactly the operator-supplied addresses on eth0 (plus any extras
beyond the first-of-family, which keep the pre-existing layered
behavior).

This is the fix the original addresses-annotation work missed: bug #1
allocated a private IP next to the public one (so VPN-routed clients
could land on the private path on Plex). Promoting addresses-supplied
IPs into the IPAM-style routing slot keeps the public IP as the only
primary IP visible from outside.

Three pieces:
- annotations.go: reject pods whose addresses/anycast IP family is
  disabled (ipv6/ipv4 annotation or NodeConfig default). Both annotation
  types rely on the family being enabled for return-path routing.
- handlers.go: peel first v6 + first v4 from Addresses into res.IP6/IP4;
  suppress IPAM for those families; skip IPAM call entirely if both
  families are addresses-supplied.
- anycast_linux.go: extend renderBird to advertise any IPAM IP that's
  outside the node's BGP aggregate as a per-pod /32 or /128. This is
  what makes 142.202.202.166 reachable when host004's pod CIDR is
  172.25.214.0/24 — the addresses-promoted IP isn't covered by the
  aggregate.

Tests: 7 new annotation tests covering the conflict cases (ipv4=false +
addresses-v4, NodeConfig default + addresses-v4, etc.) plus 5 unit tests
for the splitAddressesPrimary helper.

README updated with the addresses-replaces-IPAM behavior, the
addresses-vs-anycast comparison, the conflict rule, and a Plex-style
example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 09:46:48 -05:00
Donavan Fritz c61b12204c anycast: drop pods from nexthop set on DeletionTimestamp
Build flock Image / build (push) Has been cancelled
Previously the AnycastReconciler kept a pod in the nexthop set as long as
its PodReady condition was True. During a rolling restart that produces a
window after kubelet has accepted SIGTERM (DeletionTimestamp set, pod
still Ready until probes observe shutdown) where BGP still advertises a
path through the dying pod's veth — in-flight requests get RST'd when
the container actually exits.

Fix: introduce podAnycastEligible(pod) = !DeletionTimestamp && Ready,
swap it in at the AnycastReconciler's isReady callback, and fire the
ready-change callback when DeletionTimestamp transitions (the informer
UpdateFunc previously only fired on Ready transitions).

Result: as soon as the apiserver marks a pod for deletion, the
reconciler withdraws the local nexthop and BIRD reannounces the route
without it. Sibling replicas absorb traffic before the pod's
terminationGracePeriod elapses.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 22:24:50 -05:00
Donavan Fritz a7dc7bf1f4 anycast: kernel multipath route + L4 hash for multi-pod-per-node
Build flock Image / build (push) Has been cancelled
Move pure resolver logic out of anycast_linux.go into anycast.go so it's
unit-testable on any host. Reshape anycastTarget from a single
{hostIface, via} into a sorted list of nexthops; multiple Ready pods on
the same node binding the same anycast IP now contribute one nexthop
each.

installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath)
when the target has more than one nexthop. Single-nexthop targets keep
the simple via-route shape so 1-pod-per-node keeps rendering identically
to today's production form in `ip route show`.

flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at
startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto)
rather than just IPs. Best-effort — runs privileged in production, so
it works; falls back to L3 hash on environments where the write fails
(only matters for the multi-pod-per-node case anyway).

resolveAnycastTargets sorts nexthops by canonical(via) for stable
comparison so a quiet reconcile pass doesn't churn the kernel route.

8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop),
NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4,
family mismatch warns, determinism.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:57:32 -05:00
Donavan Fritz 2082df37e5 anycast: revert to lo + add via=pod-eth0 next-hop on host route
Build flock Image / build (push) Has been cancelled
Reverts the eth0-placement hack from e1e9544. The design doc's lo
placement is correct.

Real fix: the host's anycast /128 (or /32) route now uses the pod's own
eth0 unicast IP (same family) as the route's `via` next-hop. The kernel
then does NDP/ARP for that eth0 IP — which IS configured on the pod's
eth0 — so the pod responds normally with no proxy_ndp / proxy_arp
trickery on the anycast IP itself.

  ip -6 route add <anycast>/128 via <pod-eth0-v6> dev flock<8hex>
  ip -4 route add <anycast>/32  via <pod-eth0-v4> dev flock<8hex>

Validation: an anycast IP whose family the pod doesn't have a unicast
for is skipped with a warn (an v4 anycast on an IPv6-only pod cannot be
NDP-resolved this way; require dual-stack).

Bonus cleanup: ESRCH from RouteDel is treated as success (idempotent).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 08:02:51 -05:00
Donavan Fritz 89a3502446 M6: anycast — pod lo + Ready-gated /128/32 + BIRD export
Build flock Image / build (push) Has been cancelled
CNI ADD now adds anycast IPs to the pod's lo interface (NOT eth0 — design
doc rationale: avoid NDP/ARP DAD conflicts when N replicas share an IP).
Allocation persists the anycast list.

AnycastReconciler:
  desired = { ip → flock<8hex> } from
            committed allocations × pod.Status.PodReady=True
  diff against advertised, install/remove host /128 (v6) or /32 (v4)
  re-render bird.conf with the active set

Triggers: 2s tick, AfterCommit (per ADD/DEL), Pod informer Ready
transitions (PodCache.OnReadyChange callback).

The bird template already supported Anycast6/Anycast4 via the export
filter — this turn finally drives those slices from runtime.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 07:36:47 -05:00