Commit Graph

14 Commits

Author SHA1 Message Date
Donavan Fritz e1e9544e2e anycast: put IP on pod eth0, not lo
Build flock Image / build (push) Has been cancelled
The design doc's lo placement was motivated by avoiding NDP/ARP DAD
conflicts "across nodes advertising the same IP" — but flock pods each
sit on their own /64 veth subnet. DAD on eth0 only sees the host peer,
no cross-node L2.

With the IP on lo, the pod kernel doesn't reply to NDP solicits arriving
on eth0 (Linux default: answer NDP only for addresses on the receiving
interface). The host route `<ip>/128 dev flock<8hex>` causes the host
to do NDP for the destination on the veth; pod ignores; packet drops
silently between forwarding decision and transmit. Symptom: v4 anycast
works (proxy_arp=1 on the host veth handles ARP), v6 anycast doesn't.

Putting on eth0 makes NDP just work.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 07:55:12 -05:00
Donavan Fritz 3f6dfd3e88 bird: add source address + next hop self (v6 anycast fix)
Build flock Image / build (push) Has been cancelled
Cisco IOS rejects IPv6 BGP advertisements whose next-hop is link-local-
only. BIRD2 was synthesising a link-local next-hop for kernel-learned
routes whose dev had no via gateway (our anycast /128s). Symptom: v4
anycast worked (Cisco doesn't have the same constraint for /32s), v6
anycast didn't make it past crt001.

- pkg/routing/bird/config.go: NodeBGP.LocalV6/LocalV4. Template now
  emits `local <addr> as <asn>` and `next hop self;` in the BGP
  channel for both families, mirroring Calico's `source address` +
  `next hop self` pattern.
- pkg/agent/bird.go: localAddrSameSubnet picks an interface address
  on the peer's /64 or /24 to use as source.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 07:45:51 -05:00
Donavan Fritz 89a3502446 M6: anycast — pod lo + Ready-gated /128/32 + BIRD export
Build flock Image / build (push) Has been cancelled
CNI ADD now adds anycast IPs to the pod's lo interface (NOT eth0 — design
doc rationale: avoid NDP/ARP DAD conflicts when N replicas share an IP).
Allocation persists the anycast list.

AnycastReconciler:
  desired = { ip → flock<8hex> } from
            committed allocations × pod.Status.PodReady=True
  diff against advertised, install/remove host /128 (v6) or /32 (v4)
  re-render bird.conf with the active set

Triggers: 2s tick, AfterCommit (per ADD/DEL), Pod informer Ready
transitions (PodCache.OnReadyChange callback).

The bird template already supported Anycast6/Anycast4 via the export
filter — this turn finally drives those slices from runtime.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 07:36:47 -05:00
Donavan Fritz c7fb159632 agent: maintain NetworkUnavailable=False on owned nodes
Build flock Image / build (push) Has been cancelled
When Calico shuts down on a flock-labeled node, calico-node sets
NetworkUnavailable=True with reason CalicoIsDown. Nothing replaces it,
so kubelet's NodeController applies node.kubernetes.io/network-
unavailable:NoSchedule and new pods can't land.

flock-agent now patches Status.Conditions every 60s with
NetworkUnavailable=False (reason=FlockReady). RBAC: nodes/status patch.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:11:47 -05:00
Donavan Fritz a1222f13cc bird: add learn + explicit static blackhole protocols
Build flock Image / build (push) Has been cancelled
BIRD2's protocol kernel does not import kernel routes by default; the
import filter on the channel is just for what BIRD has already learned.
Added `learn;` so the kernel-installed blackholes (from the agent's
SummaryRoutes) are picked up.

Also added explicit `protocol static static6/static4` with one
`route <cidr> blackhole;` per NodeConfig CIDR. This is belt-and-
suspenders: even if `learn` doesn't capture the kernel blackhole, BIRD
has the route directly and exports it via the BGP filter.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:06:25 -05:00
Donavan Fritz 37cc3f6750 runtime: enable BIRD BGP on flock-labeled nodes
Build flock Image / build (push) Has been cancelled
Calico fenced off via Tigera Installation CR (apps@2121892). flock-agent
now renders bird.conf with the per-node BGP peers; bird sidecar reloads
on changes (debounced 500ms). Re-render tick every 15s reacts to
NodeConfig updates.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:02:33 -05:00
Donavan Fritz 06110884d4 runtime: skip BIRD render on first cutover
Build flock Image / build (push) Has been cancelled
Calico's calico-node still runs on every node (Tigera-Operator-managed
via ArgoCD with selfHeal). Two birds with the same ASN can't peer to
crt001 from the same source. Use a manual static route on crt001 for
the flock /64 for the first cutover; switch to live BGP after Calico is
fenced off flock-labeled nodes.

The bird sidecar stays running with the bootstrap config (kernel +
device only, no BGP), so flipping live BGP on later is a single-line
change in runtime_linux.go.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:41:40 -05:00
Donavan Fritz 0e1833dc7a ci: fix bird package name (alpine ships 'bird', not 'bird2')
Build flock Image / build (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:36:15 -05:00
Donavan Fritz eb1f5e0d8d M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer
Build flock Image / build (push) Has been cancelled
Code (Linux build, with no-op stubs for macOS dev):
- pkg/agent/netns_linux.go: ensureVeth → host-side configure (addrgenmode
  none, fe80::1/64, proxy_arp, forwarding) → move peer to pod ns →
  configure pod side (addr, default route via fe80::1, v4 169.254.1.1
  on-link gateway) → host /128 + /32 routes. Idempotent.
- pkg/agent/hostiface.go: deterministic host iface name flock<8hex> from
  FNV-1a-32(containerID).
- pkg/agent/annotations.go: parse flock.fritzlab.net/{ipv6,ipv4,cidr6,
  cidr4,ip-algo,anycast} with design-doc defaults; ParseCNIArgs for the
  K8S_POD_* keys kubelet sets.
- pkg/agent/podinfo.go: shared informer scoped to spec.nodeName==NODE,
  WaitForPod helper for ADD-vs-informer-sync race.
- pkg/agent/handlers.go: PodHandler does
    cache lookup → annotations → IPAM → store(pending) → SetupFunc →
    store(committed) → Result. Idempotent on retry. Del symmetric.
- pkg/routing/bird/config.go: text/template render with stable ordering;
  golden tests for host001 + anycast injection + sort stability.
- pkg/agent/bird.go: writes /etc/flock/bird/bird.conf, debounces 500ms,
  execs `birdc -s /run/flock/bird.ctl configure`. Installs blackhole
  kernel routes for the node summary CIDRs so BIRD's protocol kernel
  imports them.
- pkg/agent/runtime_linux.go: at startup, waits up to 60s for the per-
  node NodeConfig, reconciles committed allocations into IPAM.used,
  garbage-collects pending entries, builds PodHandler, swaps RPC
  handlers in.
- cmd/flock-installer: init-container binary that copies /opt/cni/bin/
  flock and writes 01-flock.conflist (lex-first so kubelet picks it
  over Calico's 10-calico.conflist on flock-labeled nodes).

Deploy:
- Dockerfile: alpine + iproute2 + bird2; multi-binary image.
- deploy/daemonset.yaml: install-cni init container; bird sidecar
  sharing /etc/flock/bird + /run/flock with the agent; ConfigMap-seeded
  bootstrap bird.conf so the sidecar boots before the agent renders.
  Privileged on flock-agent + install-cni; bird sidecar uses
  NET_ADMIN/RAW only.
- RBAC: pods + networkpolicies get/list/watch (the latter is reserved
  for M8 — harmless to grant now).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:33:48 -05:00
Donavan Fritz 31fcae2a97 M2 plumbing: CNI ↔ agent JSON RPC over unix socket
Build flock Image / build (push) Has been cancelled
Locks the wire format between /opt/cni/bin/flock and flock-agent. ADD
returns a CNI Result, DEL returns success/error, CHECK returns
success/error. Connection-per-RPC, newline-delimited JSON.

- pkg/cni/rpc.go: shared Op + Request + Response + framed encode/decode.
- pkg/cni/rpc_client.go: net.Dial + EncodeRequest + DecodeResponse;
  rpcSocket overridable for tests.
- pkg/cni/plugin.go: real implementations of CmdAdd/Del/Check that call
  through, mapping agent errors to types.Error.
- pkg/agent/rpc.go: rpcServer with swappable AddHandler/DelHandler/
  CheckHandler (defaults: not-implemented for ADD; idempotent-no-op for
  DEL/CHECK so kubelet teardown of a never-ADDed pod doesn't fail).
- pkg/agent/server.go: replaces the M1 accept-and-close placeholder
  with rpcServer.serve(ctx, listener); listener closes on ctx cancel.

Tests cover: Request/Response JSON roundtrip, end-to-end client →
unix-socket → fake server, agent error → CNI types.Error mapping.

ADD remains "not implemented" until netlink + IPAM wire-up — the agent
returns an error and kubelet will fail pod sandbox creation IF a node
were configured to use this CNI. host001's CNI plane is still 100%
Calico, so this changes nothing observable on the cluster.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:21:33 -05:00
Donavan Fritz c09c62fbaa pkg/agent/ipam: IPAM allocator with dual-stack + IID embedding
Build flock Image / build (push) Has been cancelled
Core building block for M2 CNI ADD. Pure logic (no netlink), mutex-
serialized, seedable from committed state via MarkInUse. Hooks into
pkg/embed for ip-algo IID derivation.

- resolveEffective() implements the design-doc cidr6/cidr4 annotation
  rules: equal→node, supernet→node, subnet→ann, disjoint→error.
  First-match-wins across multiple annotation CIDRs.
- allocV6() random IID within the effective CIDR; on ip-algo, defers
  to embed.Embed. 16-retry on collision (regenerates IID or N nibble).
- allocV4() linear scan skipping .0 (network), .1 (gateway), .<last>
  (broadcast). Smallest supported block: /30 with 1 usable address.
- Deterministic fakeRand in tests covers: intersection matrix, random
  IID, embed path, collision→retry, v4 skip-gateway, v4 exhaustion,
  dual-stack, release-then-reallocate, family mismatch rejection.

No agent Run-loop integration yet — NewIPAM(nc.Spec.CIDR6, nc.Spec.CIDR4)
will be called from Server.Run once netlink + RPC are in place.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:14:11 -05:00
Donavan Fritz 759ed21b37 M1.5: NodeConfig dynamic informer + RBAC
Build flock Image / build (push) Has been cancelled
Agent now watches nodeconfigs.flock.fritzlab.net via a client-go dynamic
informer, filters events to its own node name, and caches the typed
NodeConfig in memory (NodeConfigCache, atomic pointer). M2's IPAM will
read from that cache.

- pkg/agent/nodeconfig.go: informer + JSON-round-trip decode (avoids
  hand-written DeepCopy + scheme registration for this small a use).
- pkg/agent/server.go: starts the informer goroutine; Run terminates if
  the informer returns.
- pkg/api/v1alpha1: switch placeholder TypeMeta/ObjectMeta to metav1.
- deploy/rbac: get/list/watch on nodeconfigs.
- cmd/flock-agent: --kubeconfig flag for out-of-cluster runs (tests).

Satisfies M1 verified-by: "kubectl apply NodeConfig; agent logs read it".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 22:00:48 -05:00
Donavan Fritz e0ae98ad6c ci: move go test into Dockerfile build stage
Build flock Image / build (push) Has been cancelled
The runner runs jobs via act + DinD; `docker run -v "$PWD:/src"` from
inside the job container mounts the runner-job filesystem, not the
docker daemon's host fs, so the mount appears empty and `go test ./...`
fails with "directory prefix . does not contain main module".

Run tests in the same container that builds — same workspace, no mount.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 21:19:10 -05:00
Donavan Fritz 20f47916af flock M1 scaffold: CNI plugin + agent + NodeConfig CRD
Build flock Image / build (push) Has been cancelled
- cmd/flock + cmd/flock-agent: build cleanly; CNI ADD/DEL/CHECK return
  ErrInternal stubs until M2; agent boots, opens unix socket, logs JSON.
- pkg/agent/state.go: durable allocations.json (atomic write + fsync +
  parent fsync); pending/committed lifecycle. Tests cover round-trip,
  replace-by-cid, version mismatch, no-leak-on-tmp.
- pkg/embed/suffix.go: ip-algo IID embedding. Tests cover the /48-/96
  nibble distribution table from the design doc, determinism, prefix
  preservation, N-nibble isolation, digest-vs-fallback divergence.
- pkg/api/v1alpha1: minimal NodeConfig types (no controller-runtime yet).
- deploy/: NodeConfig CRD, empty ServiceAccount/ClusterRole, DaemonSet
  pinned to flock.fritzlab.net/agent="" label so it only runs on opted-in
  nodes.
- .gitea/workflows/main.yaml + Dockerfile: build + push to
  code.fritzlab.net/fritzlab/flock; runs go test in CI.

Design doc: dfritzlab/k8s-manager/dfritz-cni.md.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 21:17:42 -05:00