anycast: kernel multipath route + L4 hash for multi-pod-per-node
Build flock Image / build (push) Has been cancelled

Move pure resolver logic out of anycast_linux.go into anycast.go so it's
unit-testable on any host. Reshape anycastTarget from a single
{hostIface, via} into a sorted list of nexthops; multiple Ready pods on
the same node binding the same anycast IP now contribute one nexthop
each.

installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath)
when the target has more than one nexthop. Single-nexthop targets keep
the simple via-route shape so 1-pod-per-node keeps rendering identically
to today's production form in `ip route show`.

flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at
startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto)
rather than just IPs. Best-effort — runs privileged in production, so
it works; falls back to L3 hash on environments where the write fails
(only matters for the multi-pod-per-node case anyway).

resolveAnycastTargets sorts nexthops by canonical(via) for stable
comparison so a quiet reconcile pass doesn't churn the kernel route.

8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop),
NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4,
family mismatch warns, determinism.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Donavan Fritz
2026-04-25 09:57:32 -05:00
parent 5d9b6bfeec
commit a7dc7bf1f4
4 changed files with 436 additions and 73 deletions
+27
View File
@@ -6,11 +6,36 @@ import (
"context"
"fmt"
"net"
"os"
"time"
"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
)
// hostMultipathHashSysctls is the set of node-level sysctls flock-agent
// best-effort writes at startup. Default policy 0 hashes only on
// (saddr, daddr); policy 1 adds L4 (sport, dport, proto), giving real
// per-connection ECMP across multipath nexthops — required for sensible
// distribution across multiple anycast pods on the same node.
var hostMultipathHashSysctls = map[string]string{
"/proc/sys/net/ipv4/fib_multipath_hash_policy": "1",
"/proc/sys/net/ipv6/fib_multipath_hash_policy": "1",
}
// applyHostSysctls writes the sysctls in m, logging but not failing on
// errors. flock-agent is privileged so this works in the production
// DaemonSet; in environments where it doesn't, single-pod-per-node
// anycast still works (this only affects the multi-pod-per-node case).
func applyHostSysctls(s *Server) {
for path, value := range hostMultipathHashSysctls {
if err := os.WriteFile(path, []byte(value), 0o644); err != nil {
s.Logger.Warn("set host sysctl", "path", path, "value", value, "err", err)
continue
}
s.Logger.Info("host sysctl set", "path", path, "value", value)
}
}
// configureRuntime wires Pod informer, IPAM, netlink, and BIRD on a real
// Linux node. Steps:
//
@@ -23,6 +48,8 @@ import (
// 5. Build PodHandler and SetHandlers(add, del, check).
// 6. Install BIRD blackhole summary routes + render initial config.
func (s *Server) configureRuntime(ctx context.Context) error {
applyHostSysctls(s)
if err := s.firstAvailableNodeConfig(ctx, 60*time.Second); err != nil {
return err
}