anycast: kernel multipath route + L4 hash for multi-pod-per-node
Build flock Image / build (push) Has been cancelled
Build flock Image / build (push) Has been cancelled
Move pure resolver logic out of anycast_linux.go into anycast.go so it's
unit-testable on any host. Reshape anycastTarget from a single
{hostIface, via} into a sorted list of nexthops; multiple Ready pods on
the same node binding the same anycast IP now contribute one nexthop
each.
installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath)
when the target has more than one nexthop. Single-nexthop targets keep
the simple via-route shape so 1-pod-per-node keeps rendering identically
to today's production form in `ip route show`.
flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at
startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto)
rather than just IPs. Best-effort — runs privileged in production, so
it works; falls back to L3 hash on environments where the write fails
(only matters for the multi-pod-per-node case anyway).
resolveAnycastTargets sorts nexthops by canonical(via) for stable
comparison so a quiet reconcile pass doesn't churn the kernel route.
8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop),
NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4,
family mismatch warns, determinism.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -6,11 +6,36 @@ import (
|
||||
"context"
|
||||
"fmt"
|
||||
"net"
|
||||
"os"
|
||||
"time"
|
||||
|
||||
"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
|
||||
)
|
||||
|
||||
// hostMultipathHashSysctls is the set of node-level sysctls flock-agent
|
||||
// best-effort writes at startup. Default policy 0 hashes only on
|
||||
// (saddr, daddr); policy 1 adds L4 (sport, dport, proto), giving real
|
||||
// per-connection ECMP across multipath nexthops — required for sensible
|
||||
// distribution across multiple anycast pods on the same node.
|
||||
var hostMultipathHashSysctls = map[string]string{
|
||||
"/proc/sys/net/ipv4/fib_multipath_hash_policy": "1",
|
||||
"/proc/sys/net/ipv6/fib_multipath_hash_policy": "1",
|
||||
}
|
||||
|
||||
// applyHostSysctls writes the sysctls in m, logging but not failing on
|
||||
// errors. flock-agent is privileged so this works in the production
|
||||
// DaemonSet; in environments where it doesn't, single-pod-per-node
|
||||
// anycast still works (this only affects the multi-pod-per-node case).
|
||||
func applyHostSysctls(s *Server) {
|
||||
for path, value := range hostMultipathHashSysctls {
|
||||
if err := os.WriteFile(path, []byte(value), 0o644); err != nil {
|
||||
s.Logger.Warn("set host sysctl", "path", path, "value", value, "err", err)
|
||||
continue
|
||||
}
|
||||
s.Logger.Info("host sysctl set", "path", path, "value", value)
|
||||
}
|
||||
}
|
||||
|
||||
// configureRuntime wires Pod informer, IPAM, netlink, and BIRD on a real
|
||||
// Linux node. Steps:
|
||||
//
|
||||
@@ -23,6 +48,8 @@ import (
|
||||
// 5. Build PodHandler and SetHandlers(add, del, check).
|
||||
// 6. Install BIRD blackhole summary routes + render initial config.
|
||||
func (s *Server) configureRuntime(ctx context.Context) error {
|
||||
applyHostSysctls(s)
|
||||
|
||||
if err := s.firstAvailableNodeConfig(ctx, 60*time.Second); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user