M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer
Build flock Image / build (push) Has been cancelled
Build flock Image / build (push) Has been cancelled
Code (Linux build, with no-op stubs for macOS dev):
- pkg/agent/netns_linux.go: ensureVeth → host-side configure (addrgenmode
none, fe80::1/64, proxy_arp, forwarding) → move peer to pod ns →
configure pod side (addr, default route via fe80::1, v4 169.254.1.1
on-link gateway) → host /128 + /32 routes. Idempotent.
- pkg/agent/hostiface.go: deterministic host iface name flock<8hex> from
FNV-1a-32(containerID).
- pkg/agent/annotations.go: parse flock.fritzlab.net/{ipv6,ipv4,cidr6,
cidr4,ip-algo,anycast} with design-doc defaults; ParseCNIArgs for the
K8S_POD_* keys kubelet sets.
- pkg/agent/podinfo.go: shared informer scoped to spec.nodeName==NODE,
WaitForPod helper for ADD-vs-informer-sync race.
- pkg/agent/handlers.go: PodHandler does
cache lookup → annotations → IPAM → store(pending) → SetupFunc →
store(committed) → Result. Idempotent on retry. Del symmetric.
- pkg/routing/bird/config.go: text/template render with stable ordering;
golden tests for host001 + anycast injection + sort stability.
- pkg/agent/bird.go: writes /etc/flock/bird/bird.conf, debounces 500ms,
execs `birdc -s /run/flock/bird.ctl configure`. Installs blackhole
kernel routes for the node summary CIDRs so BIRD's protocol kernel
imports them.
- pkg/agent/runtime_linux.go: at startup, waits up to 60s for the per-
node NodeConfig, reconciles committed allocations into IPAM.used,
garbage-collects pending entries, builds PodHandler, swaps RPC
handlers in.
- cmd/flock-installer: init-container binary that copies /opt/cni/bin/
flock and writes 01-flock.conflist (lex-first so kubelet picks it
over Calico's 10-calico.conflist on flock-labeled nodes).
Deploy:
- Dockerfile: alpine + iproute2 + bird2; multi-binary image.
- deploy/daemonset.yaml: install-cni init container; bird sidecar
sharing /etc/flock/bird + /run/flock with the agent; ConfigMap-seeded
bootstrap bird.conf so the sidecar boots before the agent renders.
Privileged on flock-agent + install-cni; bird sidecar uses
NET_ADMIN/RAW only.
- RBAC: pods + networkpolicies get/list/watch (the latter is reserved
for M8 — harmless to grant now).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+36
-15
@@ -7,20 +7,18 @@ import (
|
||||
"net"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"time"
|
||||
|
||||
"k8s.io/client-go/rest"
|
||||
"k8s.io/client-go/tools/clientcmd"
|
||||
)
|
||||
|
||||
// SocketPath is the unix socket on which flock-agent serves RPCs from the
|
||||
// CNI plugin. Mirrors pkg/cni.SocketPath; kept as a separate constant so the
|
||||
// agent package has no import-cycle on the CNI package.
|
||||
// CNI plugin.
|
||||
const SocketPath = "/run/flock/flock.sock"
|
||||
|
||||
// Server is the agent's runtime container: state store, kubernetes informers,
|
||||
// netlink, BIRD, nftables. Current state: state store, NodeConfig informer,
|
||||
// RPC dispatcher with stub ADD/DEL/CHECK handlers (will be replaced when
|
||||
// netlink + IPAM wire-up lands).
|
||||
// Server orchestrates the agent runtime: store, informers, IPAM, netns,
|
||||
// BIRD. Run() blocks until ctx is cancelled.
|
||||
type Server struct {
|
||||
Node string
|
||||
Store *Store
|
||||
@@ -31,16 +29,14 @@ type Server struct {
|
||||
restCfg *rest.Config
|
||||
}
|
||||
|
||||
// Config configures NewServer.
|
||||
type Config struct {
|
||||
Node string
|
||||
StatePath string // typically /var/lib/flock/allocations.json
|
||||
Socket string // typically /run/flock/flock.sock
|
||||
StatePath string
|
||||
Socket string
|
||||
Logger *slog.Logger
|
||||
Kubeconfig string // empty => in-cluster config
|
||||
Kubeconfig string
|
||||
}
|
||||
|
||||
// NewServer constructs a Server. It does NOT start any goroutines; call Run.
|
||||
func NewServer(cfg Config) (*Server, error) {
|
||||
if cfg.Node == "" {
|
||||
return nil, fmt.Errorf("Node must be set")
|
||||
@@ -85,9 +81,7 @@ func loadRestConfig(kubeconfig string) (*rest.Config, error) {
|
||||
return rest.InClusterConfig()
|
||||
}
|
||||
|
||||
// Run starts the agent and blocks until ctx is cancelled. M1.5 opens the
|
||||
// unix listener, starts the NodeConfig informer, and waits. The RPC handler
|
||||
// is still a no-op until M2.
|
||||
// Run blocks until ctx is cancelled.
|
||||
func (s *Server) Run(ctx context.Context) error {
|
||||
if err := os.MkdirAll(filepath.Dir(s.socket), 0o750); err != nil {
|
||||
return fmt.Errorf("mkdir socket dir: %w", err)
|
||||
@@ -108,12 +102,20 @@ func (s *Server) Run(ctx context.Context) error {
|
||||
// RPC dispatcher takes ownership of the listener.
|
||||
go s.RPC.serve(ctx, l)
|
||||
|
||||
// NodeConfig informer. Any error from the informer terminates Run.
|
||||
// NodeConfig informer.
|
||||
errCh := make(chan error, 1)
|
||||
go func() {
|
||||
errCh <- StartNodeConfigInformer(ctx, s.restCfg, s.Node, s.NodeConfig, s.Logger)
|
||||
}()
|
||||
|
||||
// Pod informer + Handlers + Bird are wired up by configureRuntime,
|
||||
// which is platform-specific (real on Linux, no-op stub elsewhere).
|
||||
go func() {
|
||||
if err := s.configureRuntime(ctx); err != nil {
|
||||
s.Logger.Error("runtime configure failed; ADD will return errors", "err", err)
|
||||
}
|
||||
}()
|
||||
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
s.Logger.Info("flock-agent stopping")
|
||||
@@ -122,3 +124,22 @@ func (s *Server) Run(ctx context.Context) error {
|
||||
return fmt.Errorf("informer: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
// firstAvailableNodeConfig polls the cache up to `timeout`. Used to wait
|
||||
// for the operator-applied NodeConfig CR before booting the IPAM.
|
||||
func (s *Server) firstAvailableNodeConfig(ctx context.Context, timeout time.Duration) error {
|
||||
deadline := time.Now().Add(timeout)
|
||||
for {
|
||||
if s.NodeConfig.Load() != nil {
|
||||
return nil
|
||||
}
|
||||
if time.Now().After(deadline) {
|
||||
return fmt.Errorf("NodeConfig %q not observed within %s", s.Node, timeout)
|
||||
}
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return ctx.Err()
|
||||
case <-time.After(200 * time.Millisecond):
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user