flock

Author	SHA1	Message	Date
Donavan Fritz	c860e9351b	ip-algo: pod annotation > NodeConfig annotation > random Build flock Image / build (push) Has been cancelled Details Add flock.fritzlab.net/ip-algo as a node-wide default via NodeConfig metadata.annotations. Pod-level annotation still wins. Empty, missing, or invalid input at either level falls through to the next; invalid values warn-log via the agent's slog. Both unset → fully random IID (unchanged baseline). ParseAnnotations no longer touches ip-algo; ResolveIPAlgo handles the full precedence chain, called from PodHandler.Add with the cached NodeConfig's annotations and the agent logger. Tests: 9 new TestResolveIPAlgo_* cases covering pod-wins, all fall-through paths, both-absent, nil node map, whitespace, and duplicate-as-invalid. Fuzz target rebuilt without ip-algo input space (now exercised by ResolveIPAlgo unit tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 11:09:09 -05:00
Donavan Fritz	a6202a36bd	defaults: built-in baseline is dual-stack (IPv6 + IPv4), not IPv6-only Build flock Image / build (push) Has been cancelled Details BuiltinFamilyDefaults() now returns {WantV6: true, WantV4: true}. Pods that want a single family explicitly opt out via the flock.fritzlab.net/ipv4 (or ipv6) annotation, or the operator narrows the default at the node level via NodeConfig.Spec.Defaults. Annotation precedence is unchanged: pod annotation > NodeConfig defaults > built-in baseline. Tests updated to reflect the new baseline; the "opt out of v4" path now has explicit coverage. Docs updated: - NodeConfig.Spec.Defaults Go doc + CRD descriptions reflect the new baseline and its overrides - README opening framing softened from "IPv6-first" to "dual-stack, IPv6-friendly"; example pods + spec.defaults table flipped to treat dual-stack as the default and v6/v4-only as overrides - README NetworkPolicy line in the comparison table flipped to "yes (nftables)" since v1 enforcement shipped - Limitations note about IPv4-only destinations rewritten — every pod has v4 by default now, so the question is whether your IPv4 pool is routable beyond your network Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 10:07:48 -05:00
Donavan Fritz	a7dc7bf1f4	anycast: kernel multipath route + L4 hash for multi-pod-per-node Build flock Image / build (push) Has been cancelled Details Move pure resolver logic out of anycast_linux.go into anycast.go so it's unit-testable on any host. Reshape anycastTarget from a single {hostIface, via} into a sorted list of nexthops; multiple Ready pods on the same node binding the same anycast IP now contribute one nexthop each. installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath) when the target has more than one nexthop. Single-nexthop targets keep the simple via-route shape so 1-pod-per-node keeps rendering identically to today's production form in `ip route show`. flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto) rather than just IPs. Best-effort — runs privileged in production, so it works; falls back to L3 hash on environments where the write fails (only matters for the multi-pod-per-node case anyway). resolveAnycastTargets sorts nexthops by canonical(via) for stable comparison so a quiet reconcile pass doesn't churn the kernel route. 8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop), NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4, family mismatch warns, determinism. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:57:32 -05:00
Donavan Fritz	5d9b6bfeec	netpol: anchor base-chain jump on veth only, not pod IP Build flock Image / build (push) Has been cancelled Details The previous base-chain jump matched iifname/oifname AND saddr/daddr == pod eth0 IP. Anycast traffic has the anycast IP as daddr, not the pod's eth0 unicast — so anycast packets skipped the policy chain entirely and fell through to the forward chain's policy=accept. The veth uniquely belongs to one pod. Anything traversing it is to or from that pod by definition (anycast, unicast, future overlay routes). Match on iifname/oifname alone; let the pod-side chain's accept lines + trailing drop be the policy. Validated end-to-end on host001: anycast nginx pod with default-deny ingress NetPol now correctly drops traffic from any peer; adding an allow-from-podSelector rule unblocks only the matched peer. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:32:08 -05:00
Donavan Fritz	39ede9130b	netpol: NetworkPolicy v1 enforcement via nftables Build flock Image / build (push) Has been cancelled Details New pkg/agent/netpol implementing standard networking.k8s.io/v1 NetworkPolicy. Pipeline: pods + policies + namespaces → Translate → Render → Apply Supports ingress + egress, all three peer types (podSelector, namespaceSelector, ipBlock with except), numeric ports + port ranges, default-deny semantics derived from PolicyTypes (or inferred from non-empty Spec.Egress when unset). Apply path is `nft -f -` shell-out — single transaction, atomic, kernel guarantees partial-failure rollback. Idempotent dedup via last-applied script. Reconcile triggers: informer events, 30s self-heal tick, every CNI ADD/DEL. Verified against the three live cluster NetPols (calico-apiserver, remote-proxies/lodge-home-assistant, storage/garage-admin-restrict). Fuzz target stitches Translate + Render with random selector and peer inputs; 21 unit tests cover the policy semantics. Named ports skip with a warn — deferred until kubelet exposes them in a form that doesn't require shadowing pod state. Dockerfile: + nftables. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:25:58 -05:00
Donavan Fritz	71e584cf96	NodeConfig defaults + code-quality pass + fuzz tests + README NodeConfig.Spec.Defaults adds per-node IPv6/IPv4 family defaults that pod annotations can override; built-in baseline (v6=true, v4=false) still applies when the field is omitted. bird.Render now validates every operator-supplied value (peer addresses, CIDRs, anycast IPs, source addresses) before templating — fuzz found a peer address containing `}` produced unbalanced braces in bird.conf. Failing input preserved as a regression seed. Fuzz targets added for ParseAnnotations, ParseCNIArgs, HostIfaceName, canonical, IPAM allocate sequences, embed.Embed, and bird.Render. Hardened canonical/ipToU32 against nil and non-IPv4 inputs. README rewritten for outside readers — quickstart, NodeConfig + annotation reference with worked examples, anycast use cases, comparison vs Calico and Cilium, requirements, limitations. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:25:45 -05:00
Donavan Fritz	677aec2a42	bird: leading-edge reload + 500ms cooldown (was trailing 500ms debounce) Build flock Image / build (push) Has been cancelled Details A single Ready/NotReady transition no longer pays a 500ms reload wait — the first call to scheduleReload fires birdc immediately; further calls within 500ms are coalesced into one tail reload at the cooldown's end. Burst behavior is the same as before: under heavy churn (deploy rolling all replicas at once), at most one reload per 500ms. Steady-state latency from pod Ready transition to crt001 BGP withdraw: - probe period (set in pod spec, 1s minimum) - ~ms informer + reconcile + birdc + BGP UPDATE The 500ms hardcoded delay is gone. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:26:34 -05:00
Donavan Fritz	3117d00210	bird: declare anycast as protocol static; filter static→kernel export Build flock Image / build (push) Has been cancelled Details Two coupled changes that fix the anycast advertisement path: 1. Add anycast /128 + /32 prefixes as `route … blackhole` lines in the protocol static stanzas. BIRD's master tables pick them up at preference 200 — higher than kernel-learned routes — so they're the ones the BGP export filter sees. 2. The kernel protocol's export filter now rejects RTS_STATIC. Without this, BIRD would push its blackhole back into the kernel, clobbering the agent-installed `<anycast> via <pod-eth0> dev flock<8hex>` route that's actually responsible for forwarding to the pod. Result: BIRD has the route to advertise via BGP; the kernel has the right route to forward; nothing fights over the kernel table. Replaces the abandoned `gateway recursive` attempt — that's a BIRD 1.x keyword, not BIRD 2.15. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:16:45 -05:00
Donavan Fritz	7ac497249f	bird: gateway recursive on BGP protocols Build flock Image / build (push) Has been cancelled Details Default is `gateway direct` — BIRD silently rejects kernel routes whose via address isn't on a directly-connected network interface. Our anycast host routes use a pod /128 (or /32) as via, which is itself a kernel route on a flock veth, not a connected network. With `gateway recursive`, BIRD does a recursive lookup and accepts the kernel route. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:09:00 -05:00
Donavan Fritz	2082df37e5	anycast: revert to lo + add via=pod-eth0 next-hop on host route Build flock Image / build (push) Has been cancelled Details Reverts the eth0-placement hack from `e1e9544`. The design doc's lo placement is correct. Real fix: the host's anycast /128 (or /32) route now uses the pod's own eth0 unicast IP (same family) as the route's `via` next-hop. The kernel then does NDP/ARP for that eth0 IP — which IS configured on the pod's eth0 — so the pod responds normally with no proxy_ndp / proxy_arp trickery on the anycast IP itself. ip -6 route add <anycast>/128 via <pod-eth0-v6> dev flock<8hex> ip -4 route add <anycast>/32 via <pod-eth0-v4> dev flock<8hex> Validation: an anycast IP whose family the pod doesn't have a unicast for is skipped with a warn (an v4 anycast on an IPv6-only pod cannot be NDP-resolved this way; require dual-stack). Bonus cleanup: ESRCH from RouteDel is treated as success (idempotent). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 08:02:51 -05:00
Donavan Fritz	e1e9544e2e	anycast: put IP on pod eth0, not lo Build flock Image / build (push) Has been cancelled Details The design doc's lo placement was motivated by avoiding NDP/ARP DAD conflicts "across nodes advertising the same IP" — but flock pods each sit on their own /64 veth subnet. DAD on eth0 only sees the host peer, no cross-node L2. With the IP on lo, the pod kernel doesn't reply to NDP solicits arriving on eth0 (Linux default: answer NDP only for addresses on the receiving interface). The host route `<ip>/128 dev flock<8hex>` causes the host to do NDP for the destination on the veth; pod ignores; packet drops silently between forwarding decision and transmit. Symptom: v4 anycast works (proxy_arp=1 on the host veth handles ARP), v6 anycast doesn't. Putting on eth0 makes NDP just work. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 07:55:12 -05:00
Donavan Fritz	3f6dfd3e88	bird: add `source address` + `next hop self` (v6 anycast fix) Build flock Image / build (push) Has been cancelled Details Cisco IOS rejects IPv6 BGP advertisements whose next-hop is link-local- only. BIRD2 was synthesising a link-local next-hop for kernel-learned routes whose dev had no via gateway (our anycast /128s). Symptom: v4 anycast worked (Cisco doesn't have the same constraint for /32s), v6 anycast didn't make it past crt001. - pkg/routing/bird/config.go: NodeBGP.LocalV6/LocalV4. Template now emits `local <addr> as <asn>` and `next hop self;` in the BGP channel for both families, mirroring Calico's `source address` + `next hop self` pattern. - pkg/agent/bird.go: localAddrSameSubnet picks an interface address on the peer's /64 or /24 to use as source. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 07:45:51 -05:00
Donavan Fritz	89a3502446	M6: anycast — pod lo + Ready-gated /128/32 + BIRD export Build flock Image / build (push) Has been cancelled Details CNI ADD now adds anycast IPs to the pod's lo interface (NOT eth0 — design doc rationale: avoid NDP/ARP DAD conflicts when N replicas share an IP). Allocation persists the anycast list. AnycastReconciler: desired = { ip → flock<8hex> } from committed allocations × pod.Status.PodReady=True diff against advertised, install/remove host /128 (v6) or /32 (v4) re-render bird.conf with the active set Triggers: 2s tick, AfterCommit (per ADD/DEL), Pod informer Ready transitions (PodCache.OnReadyChange callback). The bird template already supported Anycast6/Anycast4 via the export filter — this turn finally drives those slices from runtime. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 07:36:47 -05:00
Donavan Fritz	c7fb159632	agent: maintain NetworkUnavailable=False on owned nodes Build flock Image / build (push) Has been cancelled Details When Calico shuts down on a flock-labeled node, calico-node sets NetworkUnavailable=True with reason CalicoIsDown. Nothing replaces it, so kubelet's NodeController applies node.kubernetes.io/network- unavailable:NoSchedule and new pods can't land. flock-agent now patches Status.Conditions every 60s with NetworkUnavailable=False (reason=FlockReady). RBAC: nodes/status patch. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 23:11:47 -05:00
Donavan Fritz	a1222f13cc	bird: add `learn` + explicit static blackhole protocols Build flock Image / build (push) Has been cancelled Details BIRD2's protocol kernel does not import kernel routes by default; the import filter on the channel is just for what BIRD has already learned. Added `learn;` so the kernel-installed blackholes (from the agent's SummaryRoutes) are picked up. Also added explicit `protocol static static6/static4` with one `route <cidr> blackhole;` per NodeConfig CIDR. This is belt-and- suspenders: even if `learn` doesn't capture the kernel blackhole, BIRD has the route directly and exports it via the BGP filter. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 23:06:25 -05:00
Donavan Fritz	37cc3f6750	runtime: enable BIRD BGP on flock-labeled nodes Build flock Image / build (push) Has been cancelled Details Calico fenced off via Tigera Installation CR (apps@2121892). flock-agent now renders bird.conf with the per-node BGP peers; bird sidecar reloads on changes (debounced 500ms). Re-render tick every 15s reacts to NodeConfig updates. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 23:02:33 -05:00
Donavan Fritz	06110884d4	runtime: skip BIRD render on first cutover Build flock Image / build (push) Has been cancelled Details Calico's calico-node still runs on every node (Tigera-Operator-managed via ArgoCD with selfHeal). Two birds with the same ASN can't peer to crt001 from the same source. Use a manual static route on crt001 for the flock /64 for the first cutover; switch to live BGP after Calico is fenced off flock-labeled nodes. The bird sidecar stays running with the bootstrap config (kernel + device only, no BGP), so flipping live BGP on later is a single-line change in runtime_linux.go. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 22:41:40 -05:00
Donavan Fritz	eb1f5e0d8d	M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer Build flock Image / build (push) Has been cancelled Details Code (Linux build, with no-op stubs for macOS dev): - pkg/agent/netns_linux.go: ensureVeth → host-side configure (addrgenmode none, fe80::1/64, proxy_arp, forwarding) → move peer to pod ns → configure pod side (addr, default route via fe80::1, v4 169.254.1.1 on-link gateway) → host /128 + /32 routes. Idempotent. - pkg/agent/hostiface.go: deterministic host iface name flock<8hex> from FNV-1a-32(containerID). - pkg/agent/annotations.go: parse flock.fritzlab.net/{ipv6,ipv4,cidr6, cidr4,ip-algo,anycast} with design-doc defaults; ParseCNIArgs for the K8S_POD_* keys kubelet sets. - pkg/agent/podinfo.go: shared informer scoped to spec.nodeName==NODE, WaitForPod helper for ADD-vs-informer-sync race. - pkg/agent/handlers.go: PodHandler does cache lookup → annotations → IPAM → store(pending) → SetupFunc → store(committed) → Result. Idempotent on retry. Del symmetric. - pkg/routing/bird/config.go: text/template render with stable ordering; golden tests for host001 + anycast injection + sort stability. - pkg/agent/bird.go: writes /etc/flock/bird/bird.conf, debounces 500ms, execs `birdc -s /run/flock/bird.ctl configure`. Installs blackhole kernel routes for the node summary CIDRs so BIRD's protocol kernel imports them. - pkg/agent/runtime_linux.go: at startup, waits up to 60s for the per- node NodeConfig, reconciles committed allocations into IPAM.used, garbage-collects pending entries, builds PodHandler, swaps RPC handlers in. - cmd/flock-installer: init-container binary that copies /opt/cni/bin/ flock and writes 01-flock.conflist (lex-first so kubelet picks it over Calico's 10-calico.conflist on flock-labeled nodes). Deploy: - Dockerfile: alpine + iproute2 + bird2; multi-binary image. - deploy/daemonset.yaml: install-cni init container; bird sidecar sharing /etc/flock/bird + /run/flock with the agent; ConfigMap-seeded bootstrap bird.conf so the sidecar boots before the agent renders. Privileged on flock-agent + install-cni; bird sidecar uses NET_ADMIN/RAW only. - RBAC: pods + networkpolicies get/list/watch (the latter is reserved for M8 — harmless to grant now). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 22:33:48 -05:00
Donavan Fritz	31fcae2a97	M2 plumbing: CNI ↔ agent JSON RPC over unix socket Build flock Image / build (push) Has been cancelled Details Locks the wire format between /opt/cni/bin/flock and flock-agent. ADD returns a CNI Result, DEL returns success/error, CHECK returns success/error. Connection-per-RPC, newline-delimited JSON. - pkg/cni/rpc.go: shared Op + Request + Response + framed encode/decode. - pkg/cni/rpc_client.go: net.Dial + EncodeRequest + DecodeResponse; rpcSocket overridable for tests. - pkg/cni/plugin.go: real implementations of CmdAdd/Del/Check that call through, mapping agent errors to types.Error. - pkg/agent/rpc.go: rpcServer with swappable AddHandler/DelHandler/ CheckHandler (defaults: not-implemented for ADD; idempotent-no-op for DEL/CHECK so kubelet teardown of a never-ADDed pod doesn't fail). - pkg/agent/server.go: replaces the M1 accept-and-close placeholder with rpcServer.serve(ctx, listener); listener closes on ctx cancel. Tests cover: Request/Response JSON roundtrip, end-to-end client → unix-socket → fake server, agent error → CNI types.Error mapping. ADD remains "not implemented" until netlink + IPAM wire-up — the agent returns an error and kubelet will fail pod sandbox creation IF a node were configured to use this CNI. host001's CNI plane is still 100% Calico, so this changes nothing observable on the cluster. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 22:21:33 -05:00
Donavan Fritz	c09c62fbaa	pkg/agent/ipam: IPAM allocator with dual-stack + IID embedding Build flock Image / build (push) Has been cancelled Details Core building block for M2 CNI ADD. Pure logic (no netlink), mutex- serialized, seedable from committed state via MarkInUse. Hooks into pkg/embed for ip-algo IID derivation. - resolveEffective() implements the design-doc cidr6/cidr4 annotation rules: equal→node, supernet→node, subnet→ann, disjoint→error. First-match-wins across multiple annotation CIDRs. - allocV6() random IID within the effective CIDR; on ip-algo, defers to embed.Embed. 16-retry on collision (regenerates IID or N nibble). - allocV4() linear scan skipping .0 (network), .1 (gateway), .<last> (broadcast). Smallest supported block: /30 with 1 usable address. - Deterministic fakeRand in tests covers: intersection matrix, random IID, embed path, collision→retry, v4 skip-gateway, v4 exhaustion, dual-stack, release-then-reallocate, family mismatch rejection. No agent Run-loop integration yet — NewIPAM(nc.Spec.CIDR6, nc.Spec.CIDR4) will be called from Server.Run once netlink + RPC are in place. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 22:14:11 -05:00
Donavan Fritz	759ed21b37	M1.5: NodeConfig dynamic informer + RBAC Build flock Image / build (push) Has been cancelled Details Agent now watches nodeconfigs.flock.fritzlab.net via a client-go dynamic informer, filters events to its own node name, and caches the typed NodeConfig in memory (NodeConfigCache, atomic pointer). M2's IPAM will read from that cache. - pkg/agent/nodeconfig.go: informer + JSON-round-trip decode (avoids hand-written DeepCopy + scheme registration for this small a use). - pkg/agent/server.go: starts the informer goroutine; Run terminates if the informer returns. - pkg/api/v1alpha1: switch placeholder TypeMeta/ObjectMeta to metav1. - deploy/rbac: get/list/watch on nodeconfigs. - cmd/flock-agent: --kubeconfig flag for out-of-cluster runs (tests). Satisfies M1 verified-by: "kubectl apply NodeConfig; agent logs read it". Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 22:00:48 -05:00
Donavan Fritz	20f47916af	flock M1 scaffold: CNI plugin + agent + NodeConfig CRD Build flock Image / build (push) Has been cancelled Details - cmd/flock + cmd/flock-agent: build cleanly; CNI ADD/DEL/CHECK return ErrInternal stubs until M2; agent boots, opens unix socket, logs JSON. - pkg/agent/state.go: durable allocations.json (atomic write + fsync + parent fsync); pending/committed lifecycle. Tests cover round-trip, replace-by-cid, version mismatch, no-leak-on-tmp. - pkg/embed/suffix.go: ip-algo IID embedding. Tests cover the /48-/96 nibble distribution table from the design doc, determinism, prefix preservation, N-nibble isolation, digest-vs-fallback divergence. - pkg/api/v1alpha1: minimal NodeConfig types (no controller-runtime yet). - deploy/: NodeConfig CRD, empty ServiceAccount/ClusterRole, DaemonSet pinned to flock.fritzlab.net/agent="" label so it only runs on opted-in nodes. - .gitea/workflows/main.yaml + Dockerfile: build + push to code.fritzlab.net/fritzlab/flock; runs go test in CI. Design doc: dfritzlab/k8s-manager/dfritz-cni.md. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 21:17:42 -05:00

22 Commits