Commit Graph

7 Commits

Author SHA1 Message Date
Donavan Fritz a6202a36bd defaults: built-in baseline is dual-stack (IPv6 + IPv4), not IPv6-only
Build flock Image / build (push) Has been cancelled
BuiltinFamilyDefaults() now returns {WantV6: true, WantV4: true}. Pods
that want a single family explicitly opt out via the
flock.fritzlab.net/ipv4 (or ipv6) annotation, or the operator narrows
the default at the node level via NodeConfig.Spec.Defaults.

Annotation precedence is unchanged: pod annotation > NodeConfig defaults
> built-in baseline. Tests updated to reflect the new baseline; the
"opt out of v4" path now has explicit coverage.

Docs updated:
  - NodeConfig.Spec.Defaults Go doc + CRD descriptions reflect the new
    baseline and its overrides
  - README opening framing softened from "IPv6-first" to "dual-stack,
    IPv6-friendly"; example pods + spec.defaults table flipped to
    treat dual-stack as the default and v6/v4-only as overrides
  - README NetworkPolicy line in the comparison table flipped to
    "yes (nftables)" since v1 enforcement shipped
  - Limitations note about IPv4-only destinations rewritten — every
    pod has v4 by default now, so the question is whether your IPv4
    pool is routable beyond your network

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 10:07:48 -05:00
Donavan Fritz 5d9b6bfeec netpol: anchor base-chain jump on veth only, not pod IP
Build flock Image / build (push) Has been cancelled
The previous base-chain jump matched iifname/oifname AND saddr/daddr ==
pod eth0 IP. Anycast traffic has the anycast IP as daddr, not the pod's
eth0 unicast — so anycast packets skipped the policy chain entirely and
fell through to the forward chain's policy=accept.

The veth uniquely belongs to one pod. Anything traversing it is to or
from that pod by definition (anycast, unicast, future overlay routes).
Match on iifname/oifname alone; let the pod-side chain's accept lines +
trailing drop be the policy.

Validated end-to-end on host001: anycast nginx pod with default-deny
ingress NetPol now correctly drops traffic from any peer; adding an
allow-from-podSelector rule unblocks only the matched peer.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:32:08 -05:00
Donavan Fritz 71e584cf96 NodeConfig defaults + code-quality pass + fuzz tests + README
NodeConfig.Spec.Defaults adds per-node IPv6/IPv4 family defaults that pod
annotations can override; built-in baseline (v6=true, v4=false) still
applies when the field is omitted.

bird.Render now validates every operator-supplied value (peer addresses,
CIDRs, anycast IPs, source addresses) before templating — fuzz found a
peer address containing `}` produced unbalanced braces in bird.conf.
Failing input preserved as a regression seed.

Fuzz targets added for ParseAnnotations, ParseCNIArgs, HostIfaceName,
canonical, IPAM allocate sequences, embed.Embed, and bird.Render.
Hardened canonical/ipToU32 against nil and non-IPv4 inputs.

README rewritten for outside readers — quickstart, NodeConfig + annotation
reference with worked examples, anycast use cases, comparison vs Calico
and Cilium, requirements, limitations.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:25:45 -05:00
Donavan Fritz c7fb159632 agent: maintain NetworkUnavailable=False on owned nodes
Build flock Image / build (push) Has been cancelled
When Calico shuts down on a flock-labeled node, calico-node sets
NetworkUnavailable=True with reason CalicoIsDown. Nothing replaces it,
so kubelet's NodeController applies node.kubernetes.io/network-
unavailable:NoSchedule and new pods can't land.

flock-agent now patches Status.Conditions every 60s with
NetworkUnavailable=False (reason=FlockReady). RBAC: nodes/status patch.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:11:47 -05:00
Donavan Fritz eb1f5e0d8d M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer
Build flock Image / build (push) Has been cancelled
Code (Linux build, with no-op stubs for macOS dev):
- pkg/agent/netns_linux.go: ensureVeth → host-side configure (addrgenmode
  none, fe80::1/64, proxy_arp, forwarding) → move peer to pod ns →
  configure pod side (addr, default route via fe80::1, v4 169.254.1.1
  on-link gateway) → host /128 + /32 routes. Idempotent.
- pkg/agent/hostiface.go: deterministic host iface name flock<8hex> from
  FNV-1a-32(containerID).
- pkg/agent/annotations.go: parse flock.fritzlab.net/{ipv6,ipv4,cidr6,
  cidr4,ip-algo,anycast} with design-doc defaults; ParseCNIArgs for the
  K8S_POD_* keys kubelet sets.
- pkg/agent/podinfo.go: shared informer scoped to spec.nodeName==NODE,
  WaitForPod helper for ADD-vs-informer-sync race.
- pkg/agent/handlers.go: PodHandler does
    cache lookup → annotations → IPAM → store(pending) → SetupFunc →
    store(committed) → Result. Idempotent on retry. Del symmetric.
- pkg/routing/bird/config.go: text/template render with stable ordering;
  golden tests for host001 + anycast injection + sort stability.
- pkg/agent/bird.go: writes /etc/flock/bird/bird.conf, debounces 500ms,
  execs `birdc -s /run/flock/bird.ctl configure`. Installs blackhole
  kernel routes for the node summary CIDRs so BIRD's protocol kernel
  imports them.
- pkg/agent/runtime_linux.go: at startup, waits up to 60s for the per-
  node NodeConfig, reconciles committed allocations into IPAM.used,
  garbage-collects pending entries, builds PodHandler, swaps RPC
  handlers in.
- cmd/flock-installer: init-container binary that copies /opt/cni/bin/
  flock and writes 01-flock.conflist (lex-first so kubelet picks it
  over Calico's 10-calico.conflist on flock-labeled nodes).

Deploy:
- Dockerfile: alpine + iproute2 + bird2; multi-binary image.
- deploy/daemonset.yaml: install-cni init container; bird sidecar
  sharing /etc/flock/bird + /run/flock with the agent; ConfigMap-seeded
  bootstrap bird.conf so the sidecar boots before the agent renders.
  Privileged on flock-agent + install-cni; bird sidecar uses
  NET_ADMIN/RAW only.
- RBAC: pods + networkpolicies get/list/watch (the latter is reserved
  for M8 — harmless to grant now).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:33:48 -05:00
Donavan Fritz 759ed21b37 M1.5: NodeConfig dynamic informer + RBAC
Build flock Image / build (push) Has been cancelled
Agent now watches nodeconfigs.flock.fritzlab.net via a client-go dynamic
informer, filters events to its own node name, and caches the typed
NodeConfig in memory (NodeConfigCache, atomic pointer). M2's IPAM will
read from that cache.

- pkg/agent/nodeconfig.go: informer + JSON-round-trip decode (avoids
  hand-written DeepCopy + scheme registration for this small a use).
- pkg/agent/server.go: starts the informer goroutine; Run terminates if
  the informer returns.
- pkg/api/v1alpha1: switch placeholder TypeMeta/ObjectMeta to metav1.
- deploy/rbac: get/list/watch on nodeconfigs.
- cmd/flock-agent: --kubeconfig flag for out-of-cluster runs (tests).

Satisfies M1 verified-by: "kubectl apply NodeConfig; agent logs read it".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:00:48 -05:00
Donavan Fritz 20f47916af flock M1 scaffold: CNI plugin + agent + NodeConfig CRD
Build flock Image / build (push) Has been cancelled
- cmd/flock + cmd/flock-agent: build cleanly; CNI ADD/DEL/CHECK return
  ErrInternal stubs until M2; agent boots, opens unix socket, logs JSON.
- pkg/agent/state.go: durable allocations.json (atomic write + fsync +
  parent fsync); pending/committed lifecycle. Tests cover round-trip,
  replace-by-cid, version mismatch, no-leak-on-tmp.
- pkg/embed/suffix.go: ip-algo IID embedding. Tests cover the /48-/96
  nibble distribution table from the design doc, determinism, prefix
  preservation, N-nibble isolation, digest-vs-fallback divergence.
- pkg/api/v1alpha1: minimal NodeConfig types (no controller-runtime yet).
- deploy/: NodeConfig CRD, empty ServiceAccount/ClusterRole, DaemonSet
  pinned to flock.fritzlab.net/agent="" label so it only runs on opted-in
  nodes.
- .gitea/workflows/main.yaml + Dockerfile: build + push to
  code.fritzlab.net/fritzlab/flock; runs go test in CI.

Design doc: dfritzlab/k8s-manager/dfritz-cni.md.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 21:17:42 -05:00