ci: push image to fritzlab-public org

This repo was transferred from fritzlab to fritzlab-public so the container package's anonymous-pull access (governed by org visibility in Gitea 1.26.1) remains open after the rest of fritzlab/* flips to limited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deploy: catch-all toleration so DS schedules on not-ready nodes
2026-05-28 13:58:56 -05:00 · 2026-05-08 09:35:27 -05:00 · 2026-05-06 08:14:35 -05:00 · 2026-05-04 21:03:59 -05:00 · 2026-04-29 09:46:48 -05:00 · 2026-04-28 18:37:05 -05:00
51 changed files with 5915 additions and 348 deletions
@@ -1,55 +1,24 @@
-name: Build flock Image
+name: flock
 on:
  push:
    branches: [main]
 jobs:
-  build:
+  release:
    runs-on: fritzlab
    steps:
-      - name: Check out repo
+      - uses: actions/checkout@v4
        uses: actions/checkout@v4
-      - name: Log in to Gitea registry
+      - uses: https://code.fritzlab.net/action/image-build@v1
        uses: docker/login-action@v3
        with:
-          registry: code.fritzlab.net
+          image: code.fritzlab.net/fritzlab-public/flock
-          username: ci-bot
+          build-args: GIT_SHA=${{ github.sha }}
-          password: ${{ secrets.REGISTRY_PASSWORD }}
+          smoke-test: |
            docker run --rm $IMAGE --help || true
            docker run --rm --entrypoint /usr/local/bin/flock $IMAGE || true
-      - name: Extract Docker metadata
+      - uses: https://code.fritzlab.net/action/image-push@v1
        id: meta
        uses: docker/metadata-action@v5
        with:
-          images: code.fritzlab.net/fritzlab/flock
+          image: code.fritzlab.net/fritzlab-public/flock
-          tags: |
+          token: ${{ secrets.CI_BOT_TOKEN }}
-            type=raw,value=latest
+          org: fritzlab-public
-            type=raw,value=${{ github.run_number }}
+          name: flock
      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          provenance: false
          build-args: |
            GIT_SHA=${{ github.sha }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          network: host
      - name: Smoke-test image
        run: |
          docker run --rm code.fritzlab.net/fritzlab/flock:${{ github.run_number }} --help || true
          docker run --rm --entrypoint /usr/local/bin/flock \
            code.fritzlab.net/fritzlab/flock:${{ github.run_number }} || true
      - name: Clean up old image tags
        run: |
          tea login add --name ci --url https://code.fritzlab.net --token '${{ secrets.CI_BOT_TOKEN }}' --no-version-check
          tea api '/packages/fritzlab?type=container' \
            | jq -r '.[] | select(.name=="flock") | select(.version | test("^[0-9]+$")) | .version' \
            | sort -n | head -n -3 \
            | while read tag; do
                echo "deleting flock:$tag"
                tea api -X DELETE "/packages/fritzlab/container/flock/$tag"
              done
@@ -0,0 +1,16 @@
 name: flock PR validation
 on:
  pull_request:
    branches: [main]
 jobs:
  validate:
    runs-on: fritzlab
    steps:
      - uses: actions/checkout@v4
      - uses: https://code.fritzlab.net/action/image-build@v1
        with:
          image: code.fritzlab.net/fritzlab/flock
          build-args: GIT_SHA=${{ github.sha }}
          smoke-test: |
            docker run --rm $IMAGE --help || true
            docker run --rm --entrypoint /usr/local/bin/flock $IMAGE || true
@@ -21,7 +21,7 @@ RUN CGO_ENABLED=0 go build -trimpath \
        -o /out/flock-installer ./cmd/flock-installer
 FROM alpine:3.21
-RUN apk add --no-cache iproute2 bird ca-certificates
+RUN apk add --no-cache iproute2 bird nftables ca-certificates
 COPY --from=build /out/flock           /usr/local/bin/flock
 COPY --from=build /out/flock-agent     /usr/local/bin/flock-agent
 COPY --from=build /out/flock-installer /usr/local/bin/flock-installer
@@ -1,22 +1,398 @@
 # flock
-Kubernetes CNI for sjc001. Per-pod IPv4 opt-in, IID embedding, Ready-gated anycast via BGP.
+A small, opinionated Kubernetes CNI built around three ideas:
-Design doc: `k8s-manager/dfritz-cni.md` (in the operator's k8s-manager repo).
+1. **Dual-stack, IPv6-friendly.** Every pod gets a globally routable IPv6
   address by default. IPv4 is also enabled by default; either family can
   be turned off per-node or per-pod when you really mean to.
 2. **No tunnels, no NAT.** Pod addresses are the real packets on the wire.
   Each node speaks BGP to its upstream router and advertises its own
   per-node prefix. The pod network is just the LAN, plus host routes.
 3. **Anycast as a primitive.** A pod can request an anycast address via
   an annotation; flock binds it on the pod's loopback and advertises a
   `/128` (or `/32`) over BGP, but only while the pod is `Ready`. Multiple
   replicas advertise the same address from different nodes for ECMP load
   balancing without a separate Service or external LB.
-Status: M1 scaffold. Not functional. See milestones table in the design doc.
+flock is built for clusters where every node already speaks BGP to one
 or more upstream routers. It deliberately leaves out features you'd
 expect from a general-purpose CNI — overlays, IPsec/Wireguard, IPAM
 coordination across nodes, kube-proxy integration — so the moving parts
 that remain are easy to reason about.
-## Layout
+> **Status:** alpha. CRD shape and annotation keys may still change.
- `cmd/flock` — CNI plugin binary (kubelet-invoked)
+## Table of contents
- `cmd/flock-agent` — DaemonSet binary
+
- `pkg/api/v1alpha1` — `NodeConfig` CRD types
+- [How it works](#how-it-works)
- `pkg/cni` — CNI plugin internals + RPC client
+- [Requirements](#requirements)
- `pkg/agent` — agent server, IPAM, state file, anycast, NetworkPolicy
+- [Quickstart](#quickstart)
- `pkg/embed` — `ip-algo` IID embedding (pure)
+- [NodeConfig CRD](#nodeconfig-crd)
- `pkg/routing/{bird,ospf}` — routing backends
+- [Pod annotations](#pod-annotations)
- `deploy/` — CRDs, RBAC, DaemonSet manifests
+- [Use cases](#use-cases)
 - [Comparison vs Calico / Cilium](#comparison-vs-calico--cilium)
 - [Limitations and non-goals](#limitations-and-non-goals)
 - [Building and testing](#building-and-testing)
 - [License](#license)
 ## How it works
 Each node runs a single `flock-agent` DaemonSet pod with three containers:
 - a privileged init container (`flock-installer`) that drops the CNI
  plugin binary into `/opt/cni/bin/flock` and writes
  `/etc/cni/net.d/01-flock.conflist`,
 - the agent itself, which owns IPAM, programs veth pairs, and tracks
  pod readiness, and
 - a [BIRD2](https://bird.network.cz/) sidecar that the agent re-renders
  and reloads when the per-node config or the active anycast set changes.
 Each node has a `NodeConfig` CR (cluster-scoped, name = node name) that
 declares its IPv6 and IPv4 prefixes, its local BGP ASN, and its upstream
 peers. The agent reads the CR via a dynamic informer.
 When kubelet runs the CNI plugin on `ADD`, the plugin opens a unix-socket
 RPC to the agent. The agent allocates an address from the per-node
 CIDRs, creates a veth pair, configures the pod side, persists the
 allocation to `/var/lib/flock/allocations.json`, and returns the result.
 There is no controller loop and no IPAM coordination across nodes — each
 node owns a non-overlapping CIDR and allocates locally.
 For anycast, the agent installs `<anycast-ip> via <pod-eth0-ip> dev <veth>`
 host routes on the node and adds the anycast IP to BIRD's BGP export
 filter. When a pod loses readiness, the agent withdraws the route from
 both the kernel and BGP within one reconcile cycle (sub-second).
 ### Packet path
 `pod.eth0` (a veth) ↔ host-side veth (with `addrgenmode none`,
 `fe80::1/64`, proxy-ARP for the v4 default-via) ↔ host kernel ↔ uplink
 NIC ↔ upstream router. No conntrack, no SNAT, no encapsulation.
 For IPv6 the host side of every veth carries the deterministic link-local
 gateway `fe80::1`, so every pod can use a fixed default route. For IPv4
 the host side answers ARP for `169.254.1.1`, providing the same fixed
 default route in v4.
 ## Requirements
 - Linux nodes. flock has not been tested on, and does not target,
  Windows nodes.
 - Kubernetes ≥ 1.27.
 - An upstream router (or pair) that accepts a BGP session from each
  node. flock has been tested with Cisco IOS-XE, Arista EOS, and FRR
  acting as the upstream; anything that speaks standard eBGP should work.
 - Globally routable (or at least datacentre-routable) IPv6 prefix
  delegated to the cluster, sliced into a per-node /64. IPv4 is
  optional but supported.
 - Each node must have a unique local ASN. Private ASNs (`64512–65534`,
  `4200000000–4294967294`) are typical.
 ## Quickstart
 ```sh
 # 1. Install CRD + RBAC + DaemonSet (single bundled manifest):
 kubectl apply -f deploy/install.yaml
 # 2. Label the node(s) you want flock to manage:
 kubectl label node <node-name> flock.fritzlab.net/agent=
 # 3. Apply a NodeConfig CR for that node (see "NodeConfig CRD" below):
 kubectl apply -f my-nodeconfig.yaml
 # 4. Verify the agent is up:
 kubectl -n kube-system get pod -l app=flock-agent -o wide
 kubectl -n kube-system exec -it ds/flock-agent -c bird -- \
    birdc -s /run/flock/bird.ctl show protocols
 ```
 The DaemonSet is gated by the `flock.fritzlab.net/agent` node label, so
 unlabelled nodes continue to use whatever CNI was installed before. This
 lets you migrate node-by-node — start with one node, prove it works, then
 proceed.
 ## NodeConfig CRD
 A `NodeConfig` is the only operator-supplied input. One per node, name
 matches the node name. Example:
 ```yaml
 apiVersion: flock.fritzlab.net/v1alpha1
 kind: NodeConfig
 metadata:
  name: node-a
 spec:
  cidr6:
    - 2001:db8:f001::/64       # Pods on this node get addresses from here.
  cidr4:
    - 192.0.2.0/24             # IPv4 pool, used only when a pod opts in.
  defaults:
    ipv6: true                 # Optional. Built-in baseline if omitted.
    ipv4: true                 # Optional. Built-in baseline if omitted.
  bgp:
    asn: 65101                 # This node's local ASN.
    peers:
      - address: 2001:db8::1   # Upstream router (IPv6 session).
        asn: 65000
      - address: 192.0.2.1     # Same router, IPv4 session.
        asn: 65000
 ```
 ### `spec.defaults`
 `spec.defaults` controls which address families a pod *gets by default*
 on this node — i.e. when the pod has no explicit `flock.fritzlab.net/ipv6`
 or `flock.fritzlab.net/ipv4` annotation. Pod annotations always override.
 If you omit `spec.defaults` (or any individual field inside it) flock
 falls back to its built-in baseline of **dual-stack (IPv6 on, IPv4 on)**.
 | Goal                              | `spec.defaults`                        |
 |-----------------------------------|----------------------------------------|
 | Dual-stack (the default)          | omit, or `{ ipv6: true,  ipv4: true }` |
 | IPv6-only node                    | `{ ipv6: true,  ipv4: false }`         |
 | IPv4-only (legacy node)           | `{ ipv6: false, ipv4: true }`          |
 A NodeConfig that resolves to "neither family" is rejected at allocation
 time, so misconfiguring both to false will surface as an error on the
 first `CNI ADD`.
 ### `spec.bgp`
 Each `peer` becomes one BGP session. The agent picks a node-local source
 address on the same subnet as the peer; if there isn't one, BIRD uses
 its default. Multi-homing (multiple peers per family — or per upstream
 router pair) is allowed.
 ## Pod annotations
 All annotations live under `flock.fritzlab.net/`. Every annotation is
 optional; leave them off to inherit the per-node defaults.
 | Annotation                          | Type   | Purpose                                                                                       |
 |-------------------------------------|--------|-----------------------------------------------------------------------------------------------|
 | `flock.fritzlab.net/ipv6`           | bool   | Override `spec.defaults.ipv6` for this pod (`true`/`false`).                                  |
 | `flock.fritzlab.net/ipv4`           | bool   | Override `spec.defaults.ipv4` for this pod (`true`/`false`).                                  |
 | `flock.fritzlab.net/cidr6`          | CIDRs  | Restrict IPv6 allocation to a sub-range of the node's `cidr6`. Comma-separated.               |
 | `flock.fritzlab.net/cidr4`          | CIDRs  | Restrict IPv4 allocation to a sub-range of the node's `cidr4`. Comma-separated.               |
 | `flock.fritzlab.net/ip-algo`        | list   | Embed identity into the IPv6 IID. Subset of `namespace,pod,image`, in order, comma-separated. |
 | `flock.fritzlab.net/anycast`        | IPs    | Bind these IPs on the pod's `lo`; advertise via BGP while pod is `Ready`. Mixed v6+v4 ok.     |
 | `flock.fritzlab.net/addresses`      | IPs    | Bind these IPs on the pod's `eth0`. The first v6 and first v4 **replace** IPAM allocation for that family — the addresses IP becomes the pod's primary IP. Mixed v6+v4 ok. Single-replica only in practice. |
 Bool values must be the literal strings `"true"` or `"false"`
 (case-insensitive, surrounding whitespace tolerated). Other values —
 `1`, `0`, `yes`, `no` — are rejected so a typo can't silently flip
 behaviour.
 ### `addresses` vs `anycast`
 Both annotations bind operator-supplied IPs onto a pod and have flock
 advertise `/128` (or `/32`) per-pod over BGP. The differences are
 where the IP lands and what it's for:
 |                            | `anycast`                                          | `addresses`                                                       |
 |----------------------------|----------------------------------------------------|-------------------------------------------------------------------|
 | Bound on                   | pod `lo`                                           | pod `eth0`                                                        |
 | Multi-replica?             | yes — every Ready replica advertises the same IP and the upstream router ECMPs across them | no — the same IP on multiple replicas is operator error           |
 | Replaces IPAM?             | no — pod still has an IPAM-allocated unicast IP   | **yes** — the first v6 + first v4 in the list become the pod's primary IPs in place of an IPAM allocation |
 | Workload visibility        | only the IPAM IP is on the primary interface     | the public IP is `eth0`'s primary address — workloads that read their own NIC see it (e.g. Plex's remote-access detection) |
 Use `anycast` for shared services with many replicas (DNS, ingress).
 Use `addresses` when one specific pod needs a known public IP that the
 workload itself must see on its primary interface.
 ### Conflict detection
 `addresses` and `anycast` reject pods that supply an IP whose family is
 disabled. If the resolved `WantV4` is false (via the pod's `ipv4`
 annotation or the NodeConfig default) and any addresses- or
 anycast-supplied IP is IPv4, the CNI ADD fails with an explicit error.
 Same for v6. Both annotation types put IPs on a pod interface and rely
 on the family being enabled for return-path routing — silently accepting
 the IP would leave a non-functional pod.
 ### Outside-aggregate advertisement
 When an `addresses` IP replaces IPAM (becomes the pod's primary IP) the
 IP is typically **outside** the node's BGP aggregate (e.g. a public
 `/32` on a node whose pod CIDR is private). flock notices this during
 BGP rendering and advertises the IP individually as a per-pod `/32` or
 `/128` so the upstream router has a route to it.
 ### Example pods
 Default dual-stack — no annotations needed:
 ```yaml
 apiVersion: v1
 kind: Pod
 metadata:
  name: minimal
 ```
 IPv6 only — opt out of the default v4 allocation:
 ```yaml
 apiVersion: v1
 kind: Pod
 metadata:
  name: v6-only
  annotations:
    flock.fritzlab.net/ipv4: "false"
 ```
 Operator-friendly addressing — `fnv(namespace) | fnv(pod) | random`
 packed into the host bits, so a pod's identity is recognisable from
 its IP in `kubectl get pods -o wide`:
 ```yaml
 metadata:
  annotations:
    flock.fritzlab.net/ip-algo: "namespace,pod"
 ```
 Anycast service — three replicas, each advertising the same v6+v4
 anycast pair from the node it lands on. The upstream router does ECMP
 across the active set:
 ```yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: dns
 spec:
  replicas: 3
  template:
    metadata:
      annotations:
        flock.fritzlab.net/anycast: "2001:db8:a::53, 192.0.2.53"
    spec:
      containers:
        - name: coredns
          image: coredns/coredns
          readinessProbe:
            httpGet: { path: /ready, port: 8181 }
            periodSeconds: 1
            failureThreshold: 1
 ```
 Workload with a known public IP — single-replica pod whose application
 inspects its own primary interface (Plex's remote-access flow). The
 addresses become the pod's primary IPs in place of any IPAM allocation;
 the pod's `eth0` ends up with exactly the supplied addresses, and BGP
 advertises them as a `/128` and `/32`:
 ```yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: plex
 spec:
  replicas: 1
  template:
    metadata:
      annotations:
        flock.fritzlab.net/addresses: "2001:db8:c606::166, 192.0.2.166"
    spec:
      containers:
        - name: plex
          image: plexinc/pms-docker
 ```
 ## Use cases
 **Highly-available DNS.** Run N CoreDNS replicas, each annotated with
 the same `anycast` IP. Point client `/etc/resolv.conf` at the anycast
 address. Each replica advertises a `/128` from its own node; the
 upstream router does ECMP. Lose a pod, traffic fails over within a
 probe cycle.
 **Replacing a kube-proxy `ClusterIP`.** Headless Service plus an anycast
 IP gives you a single stable address with load-balancing across pods,
 without the DNAT-pinning that makes long-lived TCP keepalive connections
 stick to one backend forever. ECMP routes each new flow independently.
 **Per-pod public IPv6.** Because every pod has a globally routable IPv6
 address and the cluster does no NAT, a pod's `eth0` IP is reachable from
 the rest of the internet (subject to your firewall). Useful for things
 like outgoing SMTP, where you want a stable from-address per pod, or for
 peer-to-peer protocols that don't tolerate NAT.
 **Fast pod identification in `kubectl`.** With
 `flock.fritzlab.net/ip-algo: namespace,pod` the IPv6 host bits encode
 the pod's namespace+name, so you can recognise a pod from its IP without
 a lookup. Reverse-DNS via a wildcard zone makes those IPs human-readable
 too.
 **Static-IP migration.** Annotation-driven address allocation means you
 can ask for a specific sub-CIDR (`cidr6: 2001:db8:f001::ab00/120`) for
 services that previously needed pinned IPs (mail server, ingress
 controller). When the static-IP requirement goes away, drop the
 annotation and the pod gets a normal allocation.
 ## Comparison vs Calico / Cilium
 |                          | flock                       | Calico                       | Cilium                       |
 |--------------------------|-----------------------------|------------------------------|------------------------------|
 | Default address family   | dual (IPv6+IPv4)            | IPv4                         | dual                         |
 | BGP                      | yes (BIRD)                  | yes                          | optional                     |
 | Overlay (VXLAN/IPIP)     | never                       | optional                     | yes (geneve) or native       |
 | NAT in datapath          | never                       | masquerade by default        | masquerade by default        |
 | Anycast pod addressing   | first-class                 | manual                       | optional, via service mesh   |
 | eBPF datapath            | no                          | optional                     | yes                          |
 | NetworkPolicy            | yes (nftables)              | yes (Felix)                  | yes (eBPF)                   |
 | Cluster size target      | small (< 100 nodes)         | thousands                    | thousands                    |
 | Operational surface area | low (1 DaemonSet, 1 CRD)    | medium                       | high                         |
 | Production-ready         | alpha                       | yes                          | yes                          |
 flock is not trying to compete with Calico or Cilium. The right answer
 for most clusters is one of those two — flock exists for clusters where
 every node already speaks BGP, the operator wants real (no NAT) IPv6
 addressing on every pod, and per-pod anycast is something they actually
 want to use rather than work around.
 ## Limitations and non-goals
 - NetworkPolicy supports `networking.k8s.io/v1` (ingress + egress, all
  three peer types, numeric ports + port ranges). Named ports and
  AdminNetworkPolicy are not yet implemented.
 - No NAT, no masquerade, no SNAT-egress. Pods reach the wider internet
  using their real cluster-routable addresses; if your IPv4 pool isn't
  routable beyond your network, those pods can't reach v4-only hosts on
  the public internet without help from your border router.
 - No multi-cluster, no peering across clusters.
 - Linux-only datapath.
 - IPAM is per-node — there's no global allocator and no IP mobility.
  When a pod moves to a different node it gets a new address.
 - The agent is privileged. It mounts `/var/run/netns`, configures veth
  pairs, manages kernel routes, and holds `CAP_NET_ADMIN`. This is
  inherent to being a CNI; reducing privilege further is not a goal.
 - If BIRD dies but the agent stays up, pods on that node stop being
  reachable from off-node. The DaemonSet liveness probes catch this.
 ## Building and testing
 ```sh
 # Unit tests + fuzz seed corpora (fast, ~1s):
 go test ./...
 # Targeted fuzz pass:
 go test -run NEVERMATCH -fuzz=FuzzParseAnnotations -fuzztime=30s ./pkg/agent
 go test -run NEVERMATCH -fuzz=FuzzRender           -fuzztime=30s ./pkg/routing/bird
 go test -run NEVERMATCH -fuzz=FuzzEmbed            -fuzztime=30s ./pkg/embed
 go test -run NEVERMATCH -fuzz=FuzzIPAM_Allocate    -fuzztime=30s ./pkg/agent
 # Build the container image (used by the DaemonSet):
 docker build -t flock:dev .
 ```
 The fuzz tests are also run as plain unit tests via their seed corpora,
 so every `go test ./...` exercises the discovered edge cases as
 regressions.
 `pkg/agent` has Linux-only files (`*_linux.go`) for netlink and netns
 work; the macOS/Windows build pulls in stubs from `*_stub.go` so tests
 run cleanly on developer laptops.
 ## License
-Apache 2.0.
+Apache 2.0 — see [LICENSE](LICENSE).
@@ -20,6 +20,9 @@ spec:
        openAPIV3Schema:
          type: object
          required: [spec]
          description: |
            NodeConfig is the per-node operator-supplied configuration for the
            flock CNI agent. Its name MUST equal the Kubernetes node name.
          properties:
            spec:
              type: object
@@ -35,6 +38,25 @@ spec:
                  items:
                    type: string
                    description: IPv4 CIDR owned and aggregate-advertised by this node.
                defaults:
                  type: object
                  description: |
                    Per-node baseline for which address families a pod receives
                    when its own annotations don't specify. Pod annotations
                    flock.fritzlab.net/ipv6 and flock.fritzlab.net/ipv4 always
                    override these defaults. Built-in fallback (when this block
                    or any field is omitted) is IPv6=true, IPv4=true (dual-stack).
                  properties:
                    ipv6:
                      type: boolean
                      description: |
                        Default IPv6 inclusion for pods on this node. Omit to
                        inherit the built-in baseline (true).
                    ipv4:
                      type: boolean
                      description: |
                        Default IPv4 inclusion for pods on this node. Omit to
                        inherit the built-in baseline (true).
                bgp:
                  type: object
                  required: [asn, peers]
@@ -70,3 +92,9 @@ spec:
        - name: CIDR4
          type: string
          jsonPath: .spec.cidr4
        - name: DefV6
          type: boolean
          jsonPath: .spec.defaults.ipv6
        - name: DefV4
          type: boolean
          jsonPath: .spec.defaults.ipv4
@@ -41,19 +41,10 @@ spec:
      nodeSelector:
        flock.fritzlab.net/agent: ""
      tolerations:
-        - key: fritzlab.net/cni-test
+        # CNI must schedule on a fresh node before it becomes Ready —
-          operator: Equal
+        # the node has not-ready:NoSchedule until flock installs the CNI conflist.
-          value: "true"
+        # Catch-all tolerates all taints so the agent always runs.
-          effect: NoSchedule
+        - operator: Exists
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node.kubernetes.io/not-ready
          operator: Exists
          effect: NoExecute
        - key: node.kubernetes.io/unreachable
          operator: Exists
          effect: NoExecute
      initContainers:
        - name: install-cni
          image: code.fritzlab.net/fritzlab/flock:latest
@@ -20,6 +20,9 @@ spec:
        openAPIV3Schema:
          type: object
          required: [spec]
          description: |
            NodeConfig is the per-node operator-supplied configuration for the
            flock CNI agent. Its name MUST equal the Kubernetes node name.
          properties:
            spec:
              type: object
@@ -35,6 +38,25 @@ spec:
                  items:
                    type: string
                    description: IPv4 CIDR owned and aggregate-advertised by this node.
                defaults:
                  type: object
                  description: |
                    Per-node baseline for which address families a pod receives
                    when its own annotations don't specify. Pod annotations
                    flock.fritzlab.net/ipv6 and flock.fritzlab.net/ipv4 always
                    override these defaults. Built-in fallback (when this block
                    or any field is omitted) is IPv6=true, IPv4=true (dual-stack).
                  properties:
                    ipv6:
                      type: boolean
                      description: |
                        Default IPv6 inclusion for pods on this node. Omit to
                        inherit the built-in baseline (true).
                    ipv4:
                      type: boolean
                      description: |
                        Default IPv4 inclusion for pods on this node. Omit to
                        inherit the built-in baseline (true).
                bgp:
                  type: object
                  required: [asn, peers]
@@ -70,6 +92,12 @@ spec:
        - name: CIDR4
          type: string
          jsonPath: .spec.cidr4
        - name: DefV6
          type: boolean
          jsonPath: .spec.defaults.ipv6
        - name: DefV4
          type: boolean
          jsonPath: .spec.defaults.ipv4
 ---
 apiVersion: v1
 kind: ServiceAccount
@@ -91,6 +119,9 @@ rules:
  - apiGroups: ["networking.k8s.io"]
    resources: ["networkpolicies"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["namespaces"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/status"]
    verbs: ["patch"]
@@ -151,19 +182,10 @@ spec:
      nodeSelector:
        flock.fritzlab.net/agent: ""
      tolerations:
-        - key: fritzlab.net/cni-test
+        # CNI must schedule on a fresh node before it becomes Ready —
-          operator: Equal
+        # the node has not-ready:NoSchedule until flock installs the CNI conflist.
-          value: "true"
+        # Catch-all tolerates all taints so the agent always runs.
-          effect: NoSchedule
+        - operator: Exists
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
        - key: node.kubernetes.io/not-ready
          operator: Exists
          effect: NoExecute
        - key: node.kubernetes.io/unreachable
          operator: Exists
          effect: NoExecute
      initContainers:
        - name: install-cni
          image: code.fritzlab.net/fritzlab/flock:latest
@@ -2,88 +2,241 @@ package agent
 import (
 	"fmt"
 	"log/slog"
 	"net"
 	"strings"
 	flockv1alpha1 "code.fritzlab.net/fritzlab/flock/pkg/api/v1alpha1"
 	"code.fritzlab.net/fritzlab/flock/pkg/embed"
 )
 // annotationPrefix is the namespace under which all flock pod annotations
 // live. Anything not starting with this prefix is ignored by the parser.
 const annotationPrefix = "flock.fritzlab.net/"
-// ParsedAnnotations is the typed view of a Pod's flock annotations.
+// Recognised annotation keys (without the prefix).
-type ParsedAnnotations struct {
+const (
 	annIPv6      = "ipv6"
 	annIPv4      = "ipv4"
 	annCIDR6     = "cidr6"
 	annCIDR4     = "cidr4"
 	annIPAlgo    = "ip-algo"
 	annAnycast   = "anycast"
 	annAddresses = "addresses"
 )
 // FamilyDefaults is the per-call baseline for whether a pod receives an IPv6
 // and/or IPv4 address. It is the merge of:
 //
 //  1. flock's built-in baseline (IPv6=true, IPv4=true — dual-stack), then
 //  2. any NodeConfig.Spec.Defaults override the operator has applied to
 //     the local node.
 //
 // Pod-level `flock.fritzlab.net/ipv{6,4}` annotations override this baseline.
 //
 // Use FamilyDefaultsFromNodeConfig to compute a value from a NodeConfig,
 // or BuiltinFamilyDefaults() if no NodeConfig is in scope.
 type FamilyDefaults struct {
 	// WantV6 is the default-on value for IPv6 inclusion when the pod has no
 	// explicit ipv6 annotation.
 	WantV6 bool
 	// WantV4 is the default-on value for IPv4 inclusion when the pod has no
 	// explicit ipv4 annotation.
 	WantV4 bool
 }
 // BuiltinFamilyDefaults returns flock's hard-coded fallback: dual-stack
 // (IPv6 + IPv4). This is the policy applied when no NodeConfig override is
 // in effect. Pods that want a single family explicitly opt out via the
 // `flock.fritzlab.net/ipv6` or `flock.fritzlab.net/ipv4` annotation, or
 // the operator narrows the fallback at the node level via
 // NodeConfig.Spec.Defaults.
 //
 // We define it as a function rather than a var so callers can't mutate the
 // shared baseline at runtime.
 func BuiltinFamilyDefaults() FamilyDefaults {
 	return FamilyDefaults{WantV6: true, WantV4: true}
 }
 // FamilyDefaultsFromNodeConfig resolves the effective per-node defaults,
 // falling back to BuiltinFamilyDefaults for any field the NodeConfig leaves
 // unset. A nil NodeConfig (or nil Spec.Defaults) returns the built-in
 // baseline unchanged.
 func FamilyDefaultsFromNodeConfig(nc *flockv1alpha1.NodeConfig) FamilyDefaults {
 	out := BuiltinFamilyDefaults()
 	if nc == nil || nc.Spec.Defaults == nil {
 		return out
 	}
 	if nc.Spec.Defaults.IPv6 != nil {
 		out.WantV6 = *nc.Spec.Defaults.IPv6
 	}
 	if nc.Spec.Defaults.IPv4 != nil {
 		out.WantV4 = *nc.Spec.Defaults.IPv4
 	}
 	return out
 }
 // ParsedAnnotations is the typed view of a pod's flock annotations after the
 // node-level defaults have been merged in. All slices are non-nil only when
 // the corresponding annotation was present and parsed cleanly.
 type ParsedAnnotations struct {
 	// WantV6 is true when the pod should receive an IPv6 address.
 	WantV6 bool
 	// WantV4 is true when the pod should receive an IPv4 address.
 	WantV4 bool
 	// CIDR6 narrows IPv6 allocation to specific operator-approved sub-ranges
 	// of the node's CIDR6 set. nil/empty means "use any node CIDR6".
 	CIDR6 []*net.IPNet
 	// CIDR4 narrows IPv4 allocation. nil/empty means "use any node CIDR4".
 	CIDR4 []*net.IPNet
-	IPAlgo   []embed.Field
+	// Anycast is the set of anycast IPs to bind on the pod's loopback.
 	// nil/empty means "no anycast".
 	Anycast []net.IP
 	// Addresses is the set of additional IPs to bind directly on the pod's
 	// eth0. BGP advertisement (/128+/32) is identical to Anycast; the only
 	// difference is that these IPs land on the primary interface instead of
 	// lo. Use this when the workload needs the IP directly visible on eth0
 	// (e.g. Plex, which inspects its own interfaces for remote-access setup).
 	// nil/empty means "no extra addresses".
 	Addresses []net.IP
 }
-// ParseAnnotations applies the design-doc defaults (ipv6=true, ipv4=false)
+// ParseAnnotations applies the supplied per-node defaults and validates the
-// and validates the post-merge combination.
+// post-merge combination. It is pure — it does not consult NodeConfig or any
-func ParseAnnotations(in map[string]string) (*ParsedAnnotations, error) {
+// global state — so it is safe to call from tests and fuzz targets.
-	out := &ParsedAnnotations{WantV6: true, WantV4: false}
+//
 // Annotation precedence: pod annotation > FamilyDefaults > built-in baseline.
 // Callers compute FamilyDefaults via FamilyDefaultsFromNodeConfig and pass it
 // in.
 //
 // Errors:
 //   - any unknown ipv6/ipv4 value (must be "true" or "false", case-insensitive)
 //   - any malformed cidr6/cidr4/anycast/ip-algo value
 //   - the post-merge combination resolves to neither IPv6 nor IPv4 (a pod
 //     must have at least one address)
 func ParseAnnotations(in map[string]string, defaults FamilyDefaults) (*ParsedAnnotations, error) {
 	out := &ParsedAnnotations{WantV6: defaults.WantV6, WantV4: defaults.WantV4}
-	if v, ok := in[annotationPrefix+"ipv6"]; ok {
+	if v, ok := in[annotationPrefix+annIPv6]; ok {
-		switch strings.ToLower(strings.TrimSpace(v)) {
+		b, err := parseBoolAnnotation(annIPv6, v)
-		case "true":
+		if err != nil {
-			out.WantV6 = true
+			return nil, err
 		case "false":
 			out.WantV6 = false
 		default:
 			return nil, fmt.Errorf("annotation ipv6=%q: must be true or false", v)
 		}
 		out.WantV6 = b
 	}
-	if v, ok := in[annotationPrefix+"ipv4"]; ok {
+	if v, ok := in[annotationPrefix+annIPv4]; ok {
-		switch strings.ToLower(strings.TrimSpace(v)) {
+		b, err := parseBoolAnnotation(annIPv4, v)
-		case "true":
+		if err != nil {
-			out.WantV4 = true
+			return nil, err
 		case "false":
 			out.WantV4 = false
 		default:
 			return nil, fmt.Errorf("annotation ipv4=%q: must be true or false", v)
 		}
 		out.WantV4 = b
 	}
 	if !out.WantV6 && !out.WantV4 {
-		return nil, fmt.Errorf("ipv6=false requires ipv4=true (pod must have at least one address)")
+		return nil, fmt.Errorf("annotations + defaults resolve to no address family (need at least one of ipv6/ipv4)")
 	}
-	if v, ok := in[annotationPrefix+"cidr6"]; ok {
+	if v, ok := in[annotationPrefix+annCIDR6]; ok {
-		nets, err := parseCIDRList(v)
+		nets, err := parseCIDRList(v, familyV6)
 		if err != nil {
-			return nil, fmt.Errorf("annotation cidr6: %w", err)
+			return nil, fmt.Errorf("annotation %s: %w", annCIDR6, err)
 		}
 		out.CIDR6 = nets
 	}
-	if v, ok := in[annotationPrefix+"cidr4"]; ok {
+	if v, ok := in[annotationPrefix+annCIDR4]; ok {
-		nets, err := parseCIDRList(v)
+		nets, err := parseCIDRList(v, familyV4)
 		if err != nil {
-			return nil, fmt.Errorf("annotation cidr4: %w", err)
+			return nil, fmt.Errorf("annotation %s: %w", annCIDR4, err)
 		}
 		out.CIDR4 = nets
 	}
-	if v, ok := in[annotationPrefix+"ip-algo"]; ok {
+	if v, ok := in[annotationPrefix+annAnycast]; ok {
 		fields, err := parseIPAlgo(v)
 		if err != nil {
 			return nil, fmt.Errorf("annotation ip-algo: %w", err)
 		}
 		out.IPAlgo = fields
 	}
 	if v, ok := in[annotationPrefix+"anycast"]; ok {
 		ips, err := parseIPList(v)
 		if err != nil {
-			return nil, fmt.Errorf("annotation anycast: %w", err)
+			return nil, fmt.Errorf("annotation %s: %w", annAnycast, err)
 		}
 		out.Anycast = ips
 	}
 	if v, ok := in[annotationPrefix+annAddresses]; ok {
 		ips, err := parseIPList(v)
 		if err != nil {
 			return nil, fmt.Errorf("annotation %s: %w", annAddresses, err)
 		}
 		out.Addresses = ips
 	}
 	// Reject pods that ask for an addresses- or anycast-supplied IP whose
 	// family was disabled (via the pod's ipv6/ipv4 annotation or NodeConfig
 	// default). Both annotation types put the IP on a pod interface and rely
 	// on the family being enabled for return-path routing — addresses needs
 	// the in-pod default v6/v4 route to send replies; anycast on lo needs
 	// the same default route on eth0 for the same reason. Silently accepting
 	// the IP would leave a non-functional pod, so we fail closed at ADD.
 	for _, ip := range out.Addresses {
 		if err := requireFamilyEnabled(ip, out.WantV6, out.WantV4, annAddresses); err != nil {
 			return nil, err
 		}
 	}
 	for _, ip := range out.Anycast {
 		if err := requireFamilyEnabled(ip, out.WantV6, out.WantV4, annAnycast); err != nil {
 			return nil, err
 		}
 	}
 	return out, nil
 }
-func parseCIDRList(s string) ([]*net.IPNet, error) {
+// requireFamilyEnabled returns an error when ip's family was opted out via
 // the resolved WantV6/WantV4 booleans (pod annotation > NodeConfig default >
 // built-in dual-stack). The source string identifies which annotation
 // supplied the conflicting IP so the operator's error message is specific.
 func requireFamilyEnabled(ip net.IP, wantV6, wantV4 bool, source string) error {
 	if ip.To4() != nil {
 		if !wantV4 {
 			return fmt.Errorf("annotation %s: contains IPv4 %s but ipv4 is disabled (annotation or NodeConfig default)", source, ip)
 		}
 		return nil
 	}
 	if !wantV6 {
 		return fmt.Errorf("annotation %s: contains IPv6 %s but ipv6 is disabled (annotation or NodeConfig default)", source, ip)
 	}
 	return nil
 }
 // parseBoolAnnotation accepts only "true" or "false" (case-insensitive,
 // surrounding whitespace tolerated). All other values — including "1", "0",
 // "yes", "no" — are rejected so operator typos are caught loudly rather
 // than silently producing the "false" default.
 func parseBoolAnnotation(key, v string) (bool, error) {
 	switch strings.ToLower(strings.TrimSpace(v)) {
 	case "true":
 		return true, nil
 	case "false":
 		return false, nil
 	default:
 		return false, fmt.Errorf("annotation %s=%q: must be \"true\" or \"false\"", key, v)
 	}
 }
 // addressFamily distinguishes IPv6 vs IPv4 in places where the parser must
 // validate the family of supplied CIDRs.
 type addressFamily int
 const (
 	familyAny addressFamily = iota
 	familyV6
 	familyV4
 )
 // parseCIDRList parses a comma-separated CIDR list. Whitespace around items
 // is trimmed; empty items are silently dropped. The list must contain at
 // least one entry post-trim.
 //
 // If `want` is familyV6 or familyV4 each entry's family is checked and a
 // mismatch is reported, so an `flock.fritzlab.net/cidr6` annotation cannot
 // silently slip a v4 prefix into the v6 allocator.
 func parseCIDRList(s string, want addressFamily) ([]*net.IPNet, error) {
 	var out []*net.IPNet
 	for _, part := range strings.Split(s, ",") {
 		part = strings.TrimSpace(part)
@@ -94,6 +247,17 @@ func parseCIDRList(s string) ([]*net.IPNet, error) {
 		if err != nil {
 			return nil, fmt.Errorf("invalid CIDR %q: %w", part, err)
 		}
 		isV4 := n.IP.To4() != nil
 		switch want {
 		case familyV6:
 			if isV4 {
 				return nil, fmt.Errorf("CIDR %q is IPv4, expected IPv6", part)
 			}
 		case familyV4:
 			if !isV4 {
 				return nil, fmt.Errorf("CIDR %q is IPv6, expected IPv4", part)
 			}
 		}
 		out = append(out, n)
 	}
 	if len(out) == 0 {
@@ -102,6 +266,9 @@ func parseCIDRList(s string) ([]*net.IPNet, error) {
 	return out, nil
 }
 // parseIPList parses a comma-separated literal-IP list. Same trim/empty
 // semantics as parseCIDRList. Mixed v4 and v6 entries are allowed (anycast
 // pods can advertise both families together).
 func parseIPList(s string) ([]net.IP, error) {
 	var out []net.IP
 	for _, part := range strings.Split(s, ",") {
@@ -121,31 +288,89 @@ func parseIPList(s string) ([]net.IP, error) {
 	return out, nil
 }
-func parseIPAlgo(s string) ([]embed.Field, error) {
+// ResolveIPAlgo resolves the effective ip-algo for a pod. Precedence:
-	var out []embed.Field
+//
-	for _, part := range strings.Split(s, ",") {
+//	pod annotation → NodeConfig annotation → nil (random IID).
-		part = strings.TrimSpace(part)
+//
-		switch part {
+// Empty, missing, or invalid annotations at any level fall through to the
-		case "":
+// next. Invalid input emits a warning via log; a nil log is silent. A nil
-			continue
+// return value means "no algo, generate a fully random IID".
-		case "namespace":
+//
-			out = append(out, embed.FieldNamespace)
+// "Invalid" is everything tryParseIPAlgo cannot turn into a non-empty,
-		case "pod":
+// duplicate-free subset of {namespace, pod, image} — unrecognised tokens,
-			out = append(out, embed.FieldPod)
+// duplicates, lists that resolve to zero fields after trimming.
-		case "image":
+func ResolveIPAlgo(podAnn, nodeAnn map[string]string, log *slog.Logger) []embed.Field {
-			out = append(out, embed.FieldImage)
+	if v, ok := podAnn[annotationPrefix+annIPAlgo]; ok {
-		default:
+		if fields := tryParseIPAlgo(v); fields != nil {
-			return nil, fmt.Errorf("unknown ip-algo field %q (allowed: namespace, pod, image)", part)
+			return fields
 		}
 		warnIPAlgo(log, "pod", v)
 	}
-	if len(out) == 0 {
+	if v, ok := nodeAnn[annotationPrefix+annIPAlgo]; ok {
-		return nil, fmt.Errorf("empty ip-algo")
+		if fields := tryParseIPAlgo(v); fields != nil {
 			return fields
 		}
-	return out, nil
+		warnIPAlgo(log, "NodeConfig", v)
 	}
 	return nil
 }
-// CNIArgs parses the K=V;K=V CNI_ARGS string for the kubelet keys we care
+// warnIPAlgo logs a single warning when an ip-algo annotation is present
-// about. Other keys are ignored.
+// but cannot be parsed. Empty values are not worth a warn — they are
 // indistinguishable from "key absent" by the user's design rule, so we
 // only warn when a non-empty value failed parsing.
 func warnIPAlgo(log *slog.Logger, source, value string) {
 	if log == nil {
 		return
 	}
 	if strings.TrimSpace(value) == "" {
 		return
 	}
 	log.Warn("ignoring invalid ip-algo annotation; falling through",
 		"source", source, "value", value)
 }
 // tryParseIPAlgo parses an ip-algo annotation value under the relaxed
 // "invalid → unset" rules. Returns nil for: empty input, unrecognised
 // tokens, duplicate fields, or anything that resolves to zero fields after
 // trimming. Returns the ordered field list otherwise.
 //
 // Duplicates collapse to nil rather than dedup-and-keep so the operator
 // notices their malformed annotation via the warn log instead of silently
 // losing a field they thought they had specified.
 func tryParseIPAlgo(s string) []embed.Field {
 	var out []embed.Field
 	seen := map[embed.Field]struct{}{}
 	for _, part := range strings.Split(s, ",") {
 		part = strings.TrimSpace(part)
 		if part == "" {
 			continue
 		}
 		var f embed.Field
 		switch part {
 		case string(embed.FieldNamespace):
 			f = embed.FieldNamespace
 		case string(embed.FieldApp):
 			f = embed.FieldApp
 		case string(embed.FieldImage):
 			f = embed.FieldImage
 		default:
 			return nil
 		}
 		if _, dup := seen[f]; dup {
 			return nil
 		}
 		seen[f] = struct{}{}
 		out = append(out, f)
 	}
 	if len(out) == 0 {
 		return nil
 	}
 	return out
 }
 // CNIArgs is the typed view of the K=V;K=V CNI_ARGS string passed by kubelet.
 // We only keep the fields the agent uses; unknown keys are ignored.
 type CNIArgs struct {
 	PodNamespace string
 	PodName      string
@@ -153,6 +378,10 @@ type CNIArgs struct {
 	InfraID      string
 }
 // ParseCNIArgs is permissive by design — kubelet versions and runtime
 // shims pass varying sets of keys. Malformed entries are skipped silently
 // rather than failing the whole ADD; required-key validation is the
 // caller's responsibility.
 func ParseCNIArgs(s string) CNIArgs {
 	var a CNIArgs
 	for _, kv := range strings.Split(s, ";") {
@@ -0,0 +1,145 @@
 package agent
 import (
 	"testing"
 )
 // FuzzParseAnnotations explores the joint space of {ipv6, ipv4, cidr6, cidr4,
 // anycast} annotations with random byte strings. ip-algo is handled by
 // ResolveIPAlgo (separate fuzz target below) and is no longer touched by
 // ParseAnnotations. Every recognised key is exercised by deriving a
 // deterministic input map from the fuzzed bytes.
 //
 // Properties checked:
 //
 //  1. The parser never panics on any input.
 //  2. On nil-error return, the result satisfies the design-doc invariant
 //     that at least one of WantV6 / WantV4 is true (a pod always has at
 //     least one address).
 //  3. Anycast IPs and CIDR slices are non-nil/empty only when the
 //     annotation was supplied; never spontaneously populated.
 //
 // Seed corpus covers known edge cases the spec must handle.
 func FuzzParseAnnotations(f *testing.F) {
 	// Seeds: each entry is five strings — the literal raw values for the
 	// five parsed keys. Empty string for "key absent".
 	type seed struct {
 		ipv6, ipv4, cidr6, cidr4, anycast string
 	}
 	seeds := []seed{
 		{},
 		{ipv4: "true"},
 		{ipv6: "false", ipv4: "true"},
 		{ipv6: "TRUE"},
 		{ipv6: "  true "},
 		{ipv6: "yes"},                          // invalid → expect error
 		{ipv4: "1"},                            // invalid
 		{cidr6: ""},                            // invalid (empty after split)
 		{cidr6: ","},                           // invalid (empty after trim)
 		{cidr6: "2602:817:3000:f001::/64"},     // valid single
 		{cidr6: "2602:817:3000:f001::/64,"},    // trailing comma
 		{cidr6: " 2602:817:3000:f001::/64 "},   // surrounding whitespace
 		{cidr6: "2602:817:3000:f001::/64, 2602:817:3000:f002::/64"},
 		{cidr6: "10.0.0.0/8"},                                    // family mismatch
 		{cidr4: "172.25.210.0/24"},                               // valid
 		{cidr4: "172.25.210.0/24,172.25.211.0/24"},              // multiple
 		{cidr4: "2602:817::/32"},                                 // family mismatch
 		{anycast: "2602:817:3000:ac::1"},
 		{anycast: "2602:817:3000:ac::1, 172.25.255.1"},
 		{anycast: "::1"},                              // loopback (allowed at parse time)
 		{anycast: "fe80::1"},                          // link-local (allowed at parse time)
 		{anycast: "::ffff:10.0.0.1"},                  // v4-mapped v6
 		{anycast: "0.0.0.0"},                          // unspecified
 		{anycast: "definitely-not-an-ip"},             // invalid
 		{anycast: ""},                                  // invalid
 		// Embedded NUL bytes
 		{ipv4: "true\x00"},
 		{cidr6: "2602:817:3000:f001::/64\x00"},
 		{anycast: "\x00\x00"},
 		// Unicode
 		{ipv4: "trüe"},
 		// Very long
 		{cidr6: longString("2602:817:3000:f001::/64,", 4096)},
 	}
 	for _, s := range seeds {
 		f.Add(s.ipv6, s.ipv4, s.cidr6, s.cidr4, s.anycast)
 	}
 	f.Fuzz(func(t *testing.T, ipv6, ipv4, cidr6, cidr4, anycast string) {
 		in := map[string]string{}
 		// Treat empty as "key absent" so the seed table matches the run-time
 		// shape; Kubernetes annotations cannot have a nil value but they CAN
 		// be missing entirely. Empty-string-with-key is also a real case
 		// (operator typo); add a separate seed below to cover it.
 		if ipv6 != "" {
 			in[annotationPrefix+annIPv6] = ipv6
 		}
 		if ipv4 != "" {
 			in[annotationPrefix+annIPv4] = ipv4
 		}
 		if cidr6 != "" {
 			in[annotationPrefix+annCIDR6] = cidr6
 		}
 		if cidr4 != "" {
 			in[annotationPrefix+annCIDR4] = cidr4
 		}
 		if anycast != "" {
 			in[annotationPrefix+annAnycast] = anycast
 		}
 		got, err := ParseAnnotations(in, BuiltinFamilyDefaults())
 		if err != nil {
 			return // any error is acceptable; we only require no panic
 		}
 		// Property: at least one family must be selected.
 		if !got.WantV6 && !got.WantV4 {
 			t.Fatalf("parser accepted but produced no family: in=%#v", in)
 		}
 		// Property: optional fields populated only when their key was set.
 		if _, hasAny := in[annotationPrefix+annAnycast]; !hasAny && len(got.Anycast) != 0 {
 			t.Fatalf("Anycast populated without annotation")
 		}
 		if _, hasC6 := in[annotationPrefix+annCIDR6]; !hasC6 && len(got.CIDR6) != 0 {
 			t.Fatalf("CIDR6 populated without annotation")
 		}
 		if _, hasC4 := in[annotationPrefix+annCIDR4]; !hasC4 && len(got.CIDR4) != 0 {
 			t.Fatalf("CIDR4 populated without annotation")
 		}
 	})
 }
 // FuzzParseCNIArgs requires the parser to never panic on adversarial inputs.
 // The parser is permissive by spec — it returns a CNIArgs with whatever it
 // could extract — so the only invariant is "doesn't crash".
 func FuzzParseCNIArgs(f *testing.F) {
 	f.Add("")
 	f.Add("=")
 	f.Add(";")
 	f.Add(";=;=;")
 	f.Add("K8S_POD_NAMESPACE=ns;K8S_POD_NAME=p")
 	f.Add("K8S_POD_NAMESPACE=ns;K8S_POD_NAME=p;K8S_POD_UID=abc;K8S_POD_INFRA_CONTAINER_ID=def")
 	f.Add("=value-only")
 	f.Add("key-only=")
 	f.Add("\x00\x00\x00")
 	f.Add("K8S_POD_NAMESPACE=\xff\xfe\xfd")
 	f.Add("K8S_POD_NAME=value;K8S_POD_NAME=other") // duplicate keys: last wins
 	// Long input
 	f.Add(longString("K8S_POD_NAME=x;", 4096))
 	f.Fuzz(func(t *testing.T, in string) {
 		_ = ParseCNIArgs(in)
 	})
 }
 // longString returns s repeated to total >= n bytes, useful for piling up
 // realistic-looking but oversized inputs.
 func longString(s string, n int) string {
 	if len(s) == 0 {
 		return ""
 	}
 	var b []byte
 	for len(b) < n {
 		b = append(b, s...)
 	}
 	return string(b)
 }
@@ -3,23 +3,125 @@ package agent
 import (
 	"testing"
 	flockv1alpha1 "code.fritzlab.net/fritzlab/flock/pkg/api/v1alpha1"
 	"code.fritzlab.net/fritzlab/flock/pkg/embed"
 )
-func TestParseAnnotations_Defaults(t *testing.T) {
+// boolPtr returns a pointer to b — convenient for the *bool pointer fields
-	a, err := ParseAnnotations(nil)
+// in FamilyDefaults where nil means "unset".
 func boolPtr(b bool) *bool { return &b }
 func TestBuiltinFamilyDefaults(t *testing.T) {
 	d := BuiltinFamilyDefaults()
 	if !d.WantV6 || !d.WantV4 {
 		t.Fatalf("built-in defaults wrong: v6=%v v4=%v (want dual-stack true/true)", d.WantV6, d.WantV4)
 	}
 }
 func TestFamilyDefaultsFromNodeConfig_NilNodeConfig(t *testing.T) {
 	d := FamilyDefaultsFromNodeConfig(nil)
 	if d != BuiltinFamilyDefaults() {
 		t.Fatalf("nil NodeConfig should yield built-in defaults; got %+v", d)
 	}
 }
 func TestFamilyDefaultsFromNodeConfig_NilDefaults(t *testing.T) {
 	nc := &flockv1alpha1.NodeConfig{}
 	d := FamilyDefaultsFromNodeConfig(nc)
 	if d != BuiltinFamilyDefaults() {
 		t.Fatalf("missing Defaults should yield built-in; got %+v", d)
 	}
 }
 func TestFamilyDefaultsFromNodeConfig_PartialOverride(t *testing.T) {
 	nc := &flockv1alpha1.NodeConfig{
 		Spec: flockv1alpha1.NodeConfigSpec{
 			Defaults: &flockv1alpha1.FamilyDefaults{
 				IPv4: boolPtr(false),
 			},
 		},
 	}
 	d := FamilyDefaultsFromNodeConfig(nc)
 	// IPv6 unset → keeps built-in true; IPv4 explicitly set to false →
 	// node opts the family off. Validates that an explicit false beats
 	// the dual-stack baseline rather than being silently overridden.
 	if !d.WantV6 || d.WantV4 {
 		t.Fatalf("partial override wrong: %+v (want v6=true, v4=false)", d)
 	}
 }
 func TestFamilyDefaultsFromNodeConfig_FullOverride(t *testing.T) {
 	nc := &flockv1alpha1.NodeConfig{
 		Spec: flockv1alpha1.NodeConfigSpec{
 			Defaults: &flockv1alpha1.FamilyDefaults{
 				IPv6: boolPtr(false),
 				IPv4: boolPtr(true),
 			},
 		},
 	}
 	d := FamilyDefaultsFromNodeConfig(nc)
 	if d.WantV6 || !d.WantV4 {
 		t.Fatalf("full override wrong: %+v (want v6=false, v4=true)", d)
 	}
 }
 func TestParseAnnotations_BuiltinDefaults(t *testing.T) {
 	// Built-in baseline is dual-stack — no annotation needed.
 	a, err := ParseAnnotations(nil, BuiltinFamilyDefaults())
 	if err != nil {
 		t.Fatal(err)
 	}
 	if !a.WantV6 || !a.WantV4 {
 		t.Fatalf("expected dual-stack default, got v6=%v v4=%v", a.WantV6, a.WantV4)
 	}
 }
 // TestParseAnnotations_OptOutV4 — pods that want IPv6 only must opt out
 // explicitly via the ipv4 annotation now that the built-in is dual-stack.
 func TestParseAnnotations_OptOutV4(t *testing.T) {
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv4": "false",
 	}, BuiltinFamilyDefaults())
 	if err != nil {
 		t.Fatal(err)
 	}
 	if !a.WantV6 || a.WantV4 {
-		t.Fatalf("defaults wrong: v6=%v v4=%v", a.WantV6, a.WantV4)
+		t.Fatalf("ipv4=false override failed: v6=%v v4=%v", a.WantV6, a.WantV4)
 	}
 }
-func TestParseAnnotations_DualStack(t *testing.T) {
+func TestParseAnnotations_NodeDefaultsApplied(t *testing.T) {
 	// Node config says "IPv4 is on by default for this node".
 	d := FamilyDefaults{WantV6: true, WantV4: true}
 	a, err := ParseAnnotations(nil, d)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if !a.WantV6 || !a.WantV4 {
 		t.Fatalf("node defaults not applied: %+v", a)
 	}
 }
 func TestParseAnnotations_AnnotationOverridesNodeDefault(t *testing.T) {
 	// Node says dual-stack by default; pod opts out of v4 explicitly.
 	d := FamilyDefaults{WantV6: true, WantV4: true}
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv4": "false",
 	}, d)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if !a.WantV6 || a.WantV4 {
 		t.Fatalf("annotation override failed: %+v", a)
 	}
 }
 func TestParseAnnotations_DualStackViaAnnotation(t *testing.T) {
 	// Same as built-in default; explicit ipv4=true is a no-op now but must
 	// still parse cleanly.
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv4": "true",
-	})
+	}, BuiltinFamilyDefaults())
 	if err != nil {
 		t.Fatal(err)
 	}
@@ -29,35 +131,152 @@ func TestParseAnnotations_DualStack(t *testing.T) {
 }
 func TestParseAnnotations_NoFamily(t *testing.T) {
 	// Pod opts out of both families → must be rejected.
 	if _, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv6": "false",
-	}); err == nil {
+		annotationPrefix + "ipv4": "false",
-		t.Fatalf("expected error: ipv6=false ipv4=false")
+	}, BuiltinFamilyDefaults()); err == nil {
 		t.Fatalf("expected error when pod opts out of both families")
 	}
 }
-func TestParseAnnotations_IPAlgo(t *testing.T) {
+func TestParseAnnotations_NoFamily_NodeDefaultsAlsoOff(t *testing.T) {
-	a, err := ParseAnnotations(map[string]string{
+	// Pathological NodeConfig that disables both families. Even with no pod
-		annotationPrefix + "ip-algo": "namespace,pod,image",
+	// annotation we must reject — otherwise a pod gets an empty allocation.
-	})
+	d := FamilyDefaults{WantV6: false, WantV4: false}
 	if _, err := ParseAnnotations(nil, d); err == nil {
 		t.Fatalf("expected error when both defaults are false")
 	}
 }
 func TestParseAnnotations_BoolStrictness(t *testing.T) {
 	// Common misuses that should be rejected so typos don't silently flip
 	// behaviour to the implicit-false default.
 	bad := []string{"1", "0", "yes", "no", "TrueFalse", " "}
 	for _, v := range bad {
 		_, err := ParseAnnotations(map[string]string{
 			annotationPrefix + "ipv4": v,
 		}, BuiltinFamilyDefaults())
 		if err == nil {
 			t.Errorf("expected error for ipv4=%q", v)
 		}
 	}
 }
 func TestParseAnnotations_BoolCaseInsensitive(t *testing.T) {
 	for _, v := range []string{"TRUE", "True", "  true  ", "FALSE", "False"} {
 		_, err := ParseAnnotations(map[string]string{
 			annotationPrefix + "ipv4": v,
 		}, BuiltinFamilyDefaults())
 		if err != nil {
-		t.Fatal(err)
+			t.Errorf("expected ipv4=%q to parse cleanly: %v", v, err)
 	}
 	want := []embed.Field{embed.FieldNamespace, embed.FieldPod, embed.FieldImage}
 	if len(a.IPAlgo) != len(want) {
 		t.Fatalf("ip-algo len=%d, want %d", len(a.IPAlgo), len(want))
 	}
 	for i := range want {
 		if a.IPAlgo[i] != want[i] {
 			t.Fatalf("ip-algo[%d]=%s, want %s", i, a.IPAlgo[i], want[i])
 		}
 	}
 }
 // ResolveIPAlgo: precedence is pod → node → nil. Empty / missing / invalid
 // at any level falls through to the next under the relaxed user-defined rule
 // "all three mean unset".
 func TestResolveIPAlgo_PodWins(t *testing.T) {
 	pod := map[string]string{annotationPrefix + annIPAlgo: "namespace,app"}
 	node := map[string]string{annotationPrefix + annIPAlgo: "image"}
 	got := ResolveIPAlgo(pod, node, nil)
 	want := []embed.Field{embed.FieldNamespace, embed.FieldApp}
 	if !equalFields(got, want) {
 		t.Fatalf("got %v, want %v", got, want)
 	}
 }
 func TestResolveIPAlgo_PodAbsentFallsToNode(t *testing.T) {
 	node := map[string]string{annotationPrefix + annIPAlgo: "image"}
 	got := ResolveIPAlgo(nil, node, nil)
 	want := []embed.Field{embed.FieldImage}
 	if !equalFields(got, want) {
 		t.Fatalf("got %v, want %v", got, want)
 	}
 }
 func TestResolveIPAlgo_PodEmptyFallsToNode(t *testing.T) {
 	pod := map[string]string{annotationPrefix + annIPAlgo: ""}
 	node := map[string]string{annotationPrefix + annIPAlgo: "image"}
 	got := ResolveIPAlgo(pod, node, nil)
 	want := []embed.Field{embed.FieldImage}
 	if !equalFields(got, want) {
 		t.Fatalf("got %v, want %v", got, want)
 	}
 }
 func TestResolveIPAlgo_PodInvalidFallsToNode(t *testing.T) {
 	for _, podVal := range []string{"namespace,bogus", "ns", ",", "namespace,namespace"} {
 		pod := map[string]string{annotationPrefix + annIPAlgo: podVal}
 		node := map[string]string{annotationPrefix + annIPAlgo: "app"}
 		got := ResolveIPAlgo(pod, node, nil)
 		want := []embed.Field{embed.FieldApp}
 		if !equalFields(got, want) {
 			t.Fatalf("podVal=%q: got %v, want %v", podVal, got, want)
 		}
 	}
 }
 func TestResolveIPAlgo_BothInvalidReturnsNil(t *testing.T) {
 	pod := map[string]string{annotationPrefix + annIPAlgo: "bogus"}
 	node := map[string]string{annotationPrefix + annIPAlgo: "also-bogus"}
 	if got := ResolveIPAlgo(pod, node, nil); got != nil {
 		t.Fatalf("got %v, want nil", got)
 	}
 }
 func TestResolveIPAlgo_BothAbsentReturnsNil(t *testing.T) {
 	if got := ResolveIPAlgo(nil, nil, nil); got != nil {
 		t.Fatalf("got %v, want nil", got)
 	}
 }
 func TestResolveIPAlgo_NilNodeMap(t *testing.T) {
 	pod := map[string]string{annotationPrefix + annIPAlgo: "image"}
 	got := ResolveIPAlgo(pod, nil, nil)
 	want := []embed.Field{embed.FieldImage}
 	if !equalFields(got, want) {
 		t.Fatalf("got %v, want %v", got, want)
 	}
 }
 func TestResolveIPAlgo_Whitespace(t *testing.T) {
 	pod := map[string]string{annotationPrefix + annIPAlgo: " namespace , app "}
 	got := ResolveIPAlgo(pod, nil, nil)
 	want := []embed.Field{embed.FieldNamespace, embed.FieldApp}
 	if !equalFields(got, want) {
 		t.Fatalf("got %v, want %v", got, want)
 	}
 }
 func TestResolveIPAlgo_DuplicateInvalidates(t *testing.T) {
 	pod := map[string]string{annotationPrefix + annIPAlgo: "app,app"}
 	node := map[string]string{annotationPrefix + annIPAlgo: "namespace"}
 	got := ResolveIPAlgo(pod, node, nil)
 	want := []embed.Field{embed.FieldNamespace}
 	if !equalFields(got, want) {
 		t.Fatalf("got %v, want %v (duplicate must collapse to invalid)", got, want)
 	}
 }
 func equalFields(a, b []embed.Field) bool {
 	if len(a) != len(b) {
 		return false
 	}
 	for i := range a {
 		if a[i] != b[i] {
 			return false
 		}
 	}
 	return true
 }
 func TestParseAnnotations_CIDR(t *testing.T) {
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "cidr6": "2602:817:3000:f001::/64, 2602:817:3000:f002::/64",
-	})
+	}, BuiltinFamilyDefaults())
 	if err != nil {
 		t.Fatal(err)
 	}
@@ -66,9 +285,140 @@ func TestParseAnnotations_CIDR(t *testing.T) {
 	}
 }
 func TestParseAnnotations_CIDR_FamilyMismatch(t *testing.T) {
 	// v4 prefix in a cidr6 annotation must not silently slip through.
 	if _, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "cidr6": "10.0.0.0/8",
 	}, BuiltinFamilyDefaults()); err == nil {
 		t.Fatalf("expected family mismatch error")
 	}
 	if _, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "cidr4": "2602:817::/32",
 	}, BuiltinFamilyDefaults()); err == nil {
 		t.Fatalf("expected family mismatch error")
 	}
 }
 func TestParseAnnotations_Anycast_Mixed(t *testing.T) {
 	// Anycast accepts both families together — typical for a service that
 	// advertises one v6 and one v4 anycast IP.
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "anycast": "2602:817:3000:ac::1, 172.25.255.1",
 	}, BuiltinFamilyDefaults())
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(a.Anycast) != 2 {
 		t.Fatalf("anycast len=%d", len(a.Anycast))
 	}
 }
 func TestParseAnnotations_Addresses_Mixed(t *testing.T) {
 	// Plex's case: one v6 and one v4 supplied via addresses, both families
 	// enabled (built-in defaults). Both IPs are recorded; conflict check
 	// passes; later in handlers.Add they get peeled into primary slots.
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "addresses": "2602:817:3000:c606::166, 142.202.202.166",
 	}, BuiltinFamilyDefaults())
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(a.Addresses) != 2 {
 		t.Fatalf("addresses len=%d", len(a.Addresses))
 	}
 }
 func TestParseAnnotations_Addresses_ConflictV4Disabled(t *testing.T) {
 	// addresses contains a v4 but the pod has explicitly opted out of v4.
 	// The IP would land on eth0 with no default v4 route, so reject at ADD.
 	_, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv4":      "false",
 		annotationPrefix + "addresses": "142.202.202.166",
 	}, BuiltinFamilyDefaults())
 	if err == nil {
 		t.Fatal("want error for ipv4=false + addresses v4, got nil")
 	}
 }
 func TestParseAnnotations_Addresses_ConflictV6Disabled(t *testing.T) {
 	_, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv6":      "false",
 		annotationPrefix + "ipv4":      "true",
 		annotationPrefix + "addresses": "2602:817:3000:c606::166",
 	}, BuiltinFamilyDefaults())
 	if err == nil {
 		t.Fatal("want error for ipv6=false + addresses v6, got nil")
 	}
 }
 func TestParseAnnotations_Anycast_ConflictV4Disabled(t *testing.T) {
 	// Anycast on lo also requires the family enabled — replies need the
 	// in-pod default v4 route off eth0, which only exists when v4 is on.
 	_, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv4":    "false",
 		annotationPrefix + "anycast": "172.25.255.1",
 	}, BuiltinFamilyDefaults())
 	if err == nil {
 		t.Fatal("want error for ipv4=false + anycast v4, got nil")
 	}
 }
 func TestParseAnnotations_Anycast_ConflictV6Disabled(t *testing.T) {
 	_, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv6":    "false",
 		annotationPrefix + "ipv4":    "true",
 		annotationPrefix + "anycast": "2602:817:3000:ac::1",
 	}, BuiltinFamilyDefaults())
 	if err == nil {
 		t.Fatal("want error for ipv6=false + anycast v6, got nil")
 	}
 }
 func TestParseAnnotations_Addresses_NodeDefaultV4Off(t *testing.T) {
 	// NodeConfig default opts v4 off for the node, and the pod has no
 	// explicit ipv4 annotation. addresses-v4 still conflicts because the
 	// resolved WantV4 is false. Operator must add `ipv4: "true"` on the
 	// pod to override the node default.
 	defaults := FamilyDefaults{WantV6: true, WantV4: false}
 	_, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "addresses": "142.202.202.166",
 	}, defaults)
 	if err == nil {
 		t.Fatal("want error for NodeConfig v4=false + addresses v4, got nil")
 	}
 }
 func TestParseAnnotations_Addresses_NodeDefaultV4Off_PodOptsBackIn(t *testing.T) {
 	// Same as above but pod explicitly sets ipv4=true to override the node
 	// default. Conflict resolved; parse succeeds.
 	defaults := FamilyDefaults{WantV6: true, WantV4: false}
 	a, err := ParseAnnotations(map[string]string{
 		annotationPrefix + "ipv4":      "true",
 		annotationPrefix + "addresses": "142.202.202.166",
 	}, defaults)
 	if err != nil {
 		t.Fatalf("expected ok, got %v", err)
 	}
 	if !a.WantV4 || len(a.Addresses) != 1 {
 		t.Fatalf("unexpected: %+v", a)
 	}
 }
 func TestParseCNIArgs(t *testing.T) {
 	args := ParseCNIArgs("IgnoreUnknown=1;K8S_POD_NAMESPACE=mail;K8S_POD_NAME=stalwart-0;K8S_POD_INFRA_CONTAINER_ID=abc123")
 	if args.PodNamespace != "mail" || args.PodName != "stalwart-0" || args.InfraID != "abc123" {
 		t.Fatalf("ParseCNIArgs got %+v", args)
 	}
 }
 func TestParseCNIArgs_EmptyAndMalformed(t *testing.T) {
 	// Permissive: malformed entries are skipped, never crash.
 	a := ParseCNIArgs("")
 	if a.PodName != "" {
 		t.Fatalf("empty input should yield empty CNIArgs, got %+v", a)
 	}
 	a = ParseCNIArgs(";;K8S_POD_NAMESPACE=ns;noequalshere;=novalue;K8S_POD_NAME=p")
 	if a.PodNamespace != "ns" || a.PodName != "p" {
 		t.Fatalf("permissive parse failed: %+v", a)
 	}
 }
@@ -0,0 +1,112 @@
 package agent
 import (
 	"net"
 	"sort"
 )
 // anycastNexthop is one (host-side veth, pod-eth0-IP) pair the kernel route
 // can use as a multipath nexthop.
 type anycastNexthop struct {
 	hostIface string
 	via       net.IP
 }
 // anycastTarget describes the kernel route shape for one advertised anycast
 // IP. When more than one Ready pod on this node binds the same anycast IP,
 // every Ready pod contributes a nexthop and the kernel does per-flow ECMP
 // across them.
 //
 // nexthops is sorted by canonical(via) for deterministic comparison and
 // stable kernel-route ordering across reconcile passes — the
 // AnycastReconciler skips kernel writes when the new and old targets are
 // equal, which only works if the slice order is stable.
 type anycastTarget struct {
 	nexthops []anycastNexthop
 }
 // equal reports whether two targets describe the same kernel route.
 // Both sides are expected to be sorted (the canonical constructor sorts).
 func (t anycastTarget) equal(o anycastTarget) bool {
 	if len(t.nexthops) != len(o.nexthops) {
 		return false
 	}
 	for i := range t.nexthops {
 		if t.nexthops[i].hostIface != o.nexthops[i].hostIface {
 			return false
 		}
 		if !t.nexthops[i].via.Equal(o.nexthops[i].via) {
 			return false
 		}
 	}
 	return true
 }
 // resolveAnycastTargets walks the committed allocation set and returns the
 // desired kernel-route shape for every anycast IP that has at least one
 // Ready local pod binding it. Multiple Ready pods sharing the same anycast
 // IP collapse into a single multi-nexthop target so the kernel can
 // per-flow ECMP across them.
 //
 // Pure: no kernel calls, no informer access. Pods are surfaced via the
 // isReady callback so the reconciler can plug in its informer; tests can
 // pass any function that satisfies the signature.
 //
 // warn is invoked for human-facing skip reasons (e.g. anycast with no
 // unicast of same family). nil-safe — pass nil to silently drop.
 func resolveAnycastTargets(
 	allocations []Allocation,
 	isReady func(namespace, name string) bool,
 	warn func(string),
 ) map[string]anycastTarget {
 	if warn == nil {
 		warn = func(string) {}
 	}
 	out := map[string]anycastTarget{}
 	for _, a := range allocations {
 		if a.State != StateCommitted || (len(a.Anycast) == 0 && len(a.Addresses) == 0) {
 			continue
 		}
 		if !isReady(a.Namespace, a.PodName) {
 			continue
 		}
 		host := HostIfaceName(a.ContainerID)
 		via6 := net.ParseIP(a.IP6)
 		via4 := net.ParseIP(a.IP4)
 		// Anycast (lo-bound) and Addresses (eth0-bound) are advertised
 		// identically: /128 or /32 host route on the host, BGP via BIRD.
 		for _, ipStr := range append(a.Anycast, a.Addresses...) {
 			ip := net.ParseIP(ipStr)
 			if ip == nil {
 				continue
 			}
 			var via net.IP
 			if ip.To4() != nil {
 				via = via4
 			} else {
 				via = via6
 			}
 			if via == nil {
 				warn("anycast " + ipStr + " skipped: pod " +
 					a.Namespace + "/" + a.PodName +
 					" has no unicast of same family")
 				continue
 			}
 			key := canonical(ip)
 			t := out[key]
 			t.nexthops = append(t.nexthops, anycastNexthop{hostIface: host, via: via})
 			out[key] = t
 		}
 	}
 	// Sort each target's nexthops for stable comparison + stable kernel
 	// ordering. Sort key is canonical(via) — sufficient for stability
 	// because (host, via) pairs are 1:1 (one veth per pod, one v6+v4 per
 	// pod, so via uniquely identifies the nexthop).
 	for k, t := range out {
 		sort.Slice(t.nexthops, func(i, j int) bool {
 			return canonical(t.nexthops[i].via) < canonical(t.nexthops[j].via)
 		})
 		out[k] = t
 	}
 	return out
 }
@@ -26,6 +26,11 @@ import (
 //   - Pod transitions to Ready=False or DELETE → remove kernel route, remove
 //     from BIRD export.
 //
 // When more than one Ready pod on this node binds the same anycast IP, the
 // kernel route uses RTA_MULTIPATH so the kernel does per-flow ECMP across
 // the contributing pods. This is the within-node companion to BGP-level
 // ECMP across nodes.
 //
 // Reconcile is idempotent. Triggers: AfterCommit hook, Pod informer
 // UpdateFunc on Ready transitions, periodic 2s tick.
 type AnycastReconciler struct {
@@ -42,13 +47,6 @@ type AnycastReconciler struct {
 	trigger    chan struct{}
 }
 // anycastTarget describes the kernel route shape for one advertised
 // anycast IP: which veth, and which pod eth0 IP to use as next-hop.
 type anycastTarget struct {
 	hostIface string
 	via       net.IP
 }
 // NewAnycastReconciler returns a Reconciler ready to Run.
 func NewAnycastReconciler(node string, store *Store, pods *PodCache, nc *NodeConfigCache, bird *BirdManager, routerID string, logger *slog.Logger) *AnycastReconciler {
 	return &AnycastReconciler{
@@ -96,25 +94,26 @@ func (r *AnycastReconciler) reconcile() {
 	desired := r.computeDesired()
-	// Install routes that should exist but don't (or whose target changed).
+	// Install routes that should exist but don't, or whose nexthop set
 	// changed.
 	for ip, t := range desired {
-		if cur, ok := r.advertised[ip]; ok && cur.hostIface == t.hostIface && cur.via.Equal(t.via) {
+		if cur, ok := r.advertised[ip]; ok && cur.equal(t) {
 			continue
 		}
 		if err := installAnycastRoute(ip, t); err != nil {
-			r.Logger.Warn("anycast install", "ip", ip, "host", t.hostIface, "via", t.via, "err", err)
+			r.Logger.Warn("anycast install", "ip", ip, "nexthops", len(t.nexthops), "err", err)
 			continue
 		}
-		r.Logger.Info("anycast advertise", "ip", ip, "host", t.hostIface, "via", t.via)
+		r.Logger.Info("anycast advertise", "ip", ip, "nexthops", describeNexthops(t))
 		r.advertised[ip] = t
 	}
 	// Remove routes that exist but shouldn't.
 	for ip, t := range r.advertised {
 		if _, want := desired[ip]; !want {
 			if err := removeAnycastRoute(ip, t); err != nil {
-				r.Logger.Warn("anycast remove", "ip", ip, "host", t.hostIface, "err", err)
+				r.Logger.Warn("anycast remove", "ip", ip, "err", err)
 			} else {
-				r.Logger.Info("anycast withdraw", "ip", ip, "host", t.hostIface)
+				r.Logger.Info("anycast withdraw", "ip", ip)
 			}
 			delete(r.advertised, ip)
 		}
@@ -124,44 +123,17 @@ func (r *AnycastReconciler) reconcile() {
 	r.renderBird(desired)
 }
-// computeDesired walks the Store and returns the per-ip anycastTarget for
+// computeDesired delegates to the pure resolveAnycastTargets and plugs in
-// every anycast advertisement that should be active right now. Each target
+// the live informer-based isReady callback.
 // uses the pod's own eth0 IP (same family) as the route's `via` next-hop —
 // that way kernel NDP/ARP resolves the eth0 address, which IS configured
 // on the pod's eth0, so the pod responds normally without proxy_ndp.
 func (r *AnycastReconciler) computeDesired() map[string]anycastTarget {
-	out := map[string]anycastTarget{}
+	return resolveAnycastTargets(
-	for _, a := range r.Store.Snapshot() {
+		r.Store.Snapshot(),
-		if a.State != StateCommitted || len(a.Anycast) == 0 {
+		func(ns, name string) bool {
-			continue
+			pod, ok := r.Pods.Get(ns, name)
-		}
+			return ok && podAnycastEligible(pod)
-		pod, ok := r.Pods.Get(a.Namespace, a.PodName)
+		},
-		if !ok || !podReady(pod) {
+		func(s string) { r.Logger.Warn(s) },
-			continue
+	)
 		}
 		host := HostIfaceName(a.ContainerID)
 		via6 := net.ParseIP(a.IP6)
 		via4 := net.ParseIP(a.IP4)
 		for _, ipStr := range a.Anycast {
 			ip := net.ParseIP(ipStr)
 			if ip == nil {
 				continue
 			}
 			var via net.IP
 			if ip.To4() != nil {
 				via = via4
 			} else {
 				via = via6
 			}
 			if via == nil {
 				r.Logger.Warn("anycast skipped: pod has no unicast IP of same family",
 					"pod", a.Namespace+"/"+a.PodName, "anycast", ipStr)
 				continue
 			}
 			out[canonical(ip)] = anycastTarget{hostIface: host, via: via}
 		}
 	}
 	return out
 }
 func (r *AnycastReconciler) renderBird(desired map[string]anycastTarget) {
@@ -170,72 +142,139 @@ func (r *AnycastReconciler) renderBird(desired map[string]anycastTarget) {
 		return
 	}
 	var v6, v4 []string
-	for ipStr := range desired {
+	seen := map[string]struct{}{}
-		ip := net.ParseIP(ipStr)
+	add := func(ip net.IP) {
-		if ip == nil {
+		key := canonical(ip)
-			continue
+		if _, dup := seen[key]; dup {
 			return
 		}
 		seen[key] = struct{}{}
 		if ip.To4() != nil {
 			v4 = append(v4, ip.To4().String())
 		} else {
 			v6 = append(v6, ip.To16().String())
 		}
 	}
 	for ipStr := range desired {
 		if ip := net.ParseIP(ipStr); ip != nil {
 			add(ip)
 		}
 	}
 	// A pod IP that lives outside the node's BGP aggregate (e.g. an
 	// addresses-annotation IP promoted to be the pod's primary v4 — Plex's
 	// 142.202.202.166 against host004's 172.25.214.0/24) is not naturally
 	// covered by the aggregate, so it must be advertised individually as a
 	// /32 or /128. Anycast and addresses extras are already covered by the
 	// `desired` loop above; this sweep is for promoted-primary IPs which do
 	// not flow through the AnycastReconciler.
 	nodeV6, nodeV4 := parseNodeCIDRs(nc)
 	for _, a := range r.Store.Snapshot() {
 		if a.State != StateCommitted {
 			continue
 		}
 		if ip := net.ParseIP(a.IP6); ip != nil && !ipInAny(ip, nodeV6) {
 			add(ip)
 		}
 		if ip := net.ParseIP(a.IP4); ip != nil && !ipInAny(ip, nodeV4) {
 			add(ip)
 		}
 	}
 	if err := r.Bird.Render(nc, v6, v4, r.RouterID); err != nil {
 		r.Logger.Warn("anycast bird render", "err", err)
 	}
 }
-// installAnycastRoute installs `<ipStr>/<128|32> via t.via dev t.hostIface`.
+// parseNodeCIDRs parses NodeConfig.Spec.CIDR6/4 strings into IPNets,
 // silently dropping malformed entries (admission-time validation should
 // have rejected them long before this point).
 func parseNodeCIDRs(nc *flockv1alpha1.NodeConfig) (v6, v4 []*net.IPNet) {
 	for _, s := range nc.Spec.CIDR6 {
 		if _, n, err := net.ParseCIDR(s); err == nil {
 			v6 = append(v6, n)
 		}
 	}
 	for _, s := range nc.Spec.CIDR4 {
 		if _, n, err := net.ParseCIDR(s); err == nil {
 			v4 = append(v4, n)
 		}
 	}
 	return
 }
 func ipInAny(ip net.IP, nets []*net.IPNet) bool {
 	for _, n := range nets {
 		if n.Contains(ip) {
 			return true
 		}
 	}
 	return false
 }
 // installAnycastRoute installs `<ipStr>/<128|32>` pointing at the
 // nexthop set in t. With one nexthop the route is a plain via-route;
 // with multiple, it's a multipath route using RTA_MULTIPATH so the
 // kernel hashes flows across the constituent pods.
 //
 // Idempotent — RouteReplace overwrites a stale entry.
 func installAnycastRoute(ipStr string, t anycastTarget) error {
 	ip := net.ParseIP(ipStr)
 	if ip == nil {
 		return fmt.Errorf("bad ip %q", ipStr)
 	}
-	link, err := netlink.LinkByName(t.hostIface)
+	if len(t.nexthops) == 0 {
-	if err != nil {
+		return fmt.Errorf("anycast %s: no nexthops", ipStr)
 		return fmt.Errorf("lookup %s: %w", t.hostIface, err)
 	}
 	prefix := 128
 	if ip.To4() != nil {
 		prefix = 32
 		ip = ip.To4()
 	}
-	r := &netlink.Route{
+	r := &netlink.Route{Dst: cidrFor(ip, prefix)}
 	if len(t.nexthops) == 1 {
 		// Single nexthop — keep the route shape identical to today's
 		// production form. Functionally equivalent to a 1-element
 		// MultiPath but `ip route show` renders nicer for operators.
 		nh := t.nexthops[0]
 		link, err := netlink.LinkByName(nh.hostIface)
 		if err != nil {
 			return fmt.Errorf("lookup %s: %w", nh.hostIface, err)
 		}
 		r.LinkIndex = link.Attrs().Index
 		r.Gw = nh.via
 	} else {
 		hops := make([]*netlink.NexthopInfo, 0, len(t.nexthops))
 		for _, nh := range t.nexthops {
 			link, err := netlink.LinkByName(nh.hostIface)
 			if err != nil {
 				return fmt.Errorf("lookup %s: %w", nh.hostIface, err)
 			}
 			hops = append(hops, &netlink.NexthopInfo{
 				LinkIndex: link.Attrs().Index,
-		Dst:       cidrFor(ip, prefix),
+				Gw:        nh.via,
-		Gw:        t.via,
+				Hops:      0,
-		// SCOPE_UNIVERSE — the gateway is on a different "logical" subnet
+			})
-		// than the local /128 route, but reachable on this veth. Linux is
+		}
-		// happy as long as the veth has IPv6 forwarding on (it does — set
+		r.MultiPath = hops
 		// in configureHostSide) and the pod's eth0 has the via address
 		// (also true — that's the pod's IP6/IP4 we allocated).
 	}
 	return netlink.RouteReplace(r)
 }
 // removeAnycastRoute deletes the host route. Missing routes / interfaces
 // are treated as success — DEL paths can race with veth teardown.
-func removeAnycastRoute(ipStr string, t anycastTarget) error {
+//
 // Kernel route deletion matches by destination prefix; we don't need to
 // re-specify the nexthop set.
 func removeAnycastRoute(ipStr string, _ anycastTarget) error {
 	ip := net.ParseIP(ipStr)
 	if ip == nil {
 		return nil
 	}
 	link, err := netlink.LinkByName(t.hostIface)
 	if err != nil {
 		return nil
 	}
 	prefix := 128
 	if ip.To4() != nil {
 		prefix = 32
 		ip = ip.To4()
 	}
-	r := &netlink.Route{
+	r := &netlink.Route{Dst: cidrFor(ip, prefix)}
 		LinkIndex: link.Attrs().Index,
 		Dst:       cidrFor(ip, prefix),
 		Gw:        t.via,
 	}
 	if err := netlink.RouteDel(r); err != nil {
 		// ESRCH ("no such process") is netlink-speak for "no such route";
 		// treat as success.
@@ -247,5 +286,17 @@ func removeAnycastRoute(ipStr string, t anycastTarget) error {
 	return nil
 }
 // describeNexthops returns a compact string for log messages.
 func describeNexthops(t anycastTarget) string {
 	var s string
 	for i, nh := range t.nexthops {
 		if i > 0 {
 			s += ","
 		}
 		s += nh.hostIface + "→" + nh.via.String()
 	}
 	return s
 }
 // _ = flockv1alpha1 to silence unused import warnings on minimal builds.
 var _ = flockv1alpha1.GroupName
@@ -0,0 +1,227 @@
 package agent
 import (
 	"net"
 	"strings"
 	"testing"
 )
 // allReady is a convenience isReady that says yes to every pod.
 func allReady(_, _ string) bool { return true }
 // readyOnly returns an isReady that only says yes to the named pods.
 func readyOnly(want ...string) func(string, string) bool {
 	set := map[string]struct{}{}
 	for _, n := range want {
 		set[n] = struct{}{}
 	}
 	return func(_, name string) bool {
 		_, ok := set[name]
 		return ok
 	}
 }
 func TestResolveAnycastTargets_OnePodOneAnycast(t *testing.T) {
 	allocs := []Allocation{{
 		ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 		State:       StateCommitted,
 		IP6:         "2001:db8::1",
 		Anycast:     []string{"2001:db8:a::1"},
 	}}
 	out := resolveAnycastTargets(allocs, allReady, nil)
 	if len(out) != 1 {
 		t.Fatalf("expected 1 anycast IP, got %d", len(out))
 	}
 	tgt, ok := out["2001:db8:a::1"]
 	if !ok {
 		t.Fatalf("missing target")
 	}
 	if len(tgt.nexthops) != 1 {
 		t.Fatalf("expected 1 nexthop, got %d", len(tgt.nexthops))
 	}
 	if !tgt.nexthops[0].via.Equal(net.ParseIP("2001:db8::1")) {
 		t.Fatalf("nexthop via wrong: %v", tgt.nexthops[0].via)
 	}
 }
 // Two pods on the same node binding the same anycast IP must produce a
 // SINGLE target with TWO nexthops. The previous behaviour (overwriting)
 // was the bug this whole change exists to fix.
 func TestResolveAnycastTargets_TwoPodsSameAnycast_MultiNexthop(t *testing.T) {
 	allocs := []Allocation{
 		{ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 			State: StateCommitted, IP6: "2001:db8::2",
 			Anycast: []string{"2001:db8:a::1"}},
 		{ContainerID: "c2", Namespace: "ns", PodName: "pod-b",
 			State: StateCommitted, IP6: "2001:db8::1",
 			Anycast: []string{"2001:db8:a::1"}},
 	}
 	out := resolveAnycastTargets(allocs, allReady, nil)
 	tgt := out["2001:db8:a::1"]
 	if len(tgt.nexthops) != 2 {
 		t.Fatalf("expected 2 nexthops, got %d", len(tgt.nexthops))
 	}
 	// Order should be sorted by canonical(via) — ::1 before ::2.
 	if !tgt.nexthops[0].via.Equal(net.ParseIP("2001:db8::1")) {
 		t.Fatalf("nexthops not sorted by via; got %v first", tgt.nexthops[0].via)
 	}
 	if !tgt.nexthops[1].via.Equal(net.ParseIP("2001:db8::2")) {
 		t.Fatalf("nexthops not sorted by via; got %v second", tgt.nexthops[1].via)
 	}
 	// HostIface differs per pod (different containerID → different FNV).
 	if tgt.nexthops[0].hostIface == tgt.nexthops[1].hostIface {
 		t.Fatalf("expected distinct hostIfaces, both %q", tgt.nexthops[0].hostIface)
 	}
 }
 // When one of the contributing pods goes NotReady, only the remaining
 // Ready pod should appear in the target's nexthop set.
 func TestResolveAnycastTargets_NotReadyDropped(t *testing.T) {
 	allocs := []Allocation{
 		{ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 			State: StateCommitted, IP6: "2001:db8::1",
 			Anycast: []string{"2001:db8:a::1"}},
 		{ContainerID: "c2", Namespace: "ns", PodName: "pod-b",
 			State: StateCommitted, IP6: "2001:db8::2",
 			Anycast: []string{"2001:db8:a::1"}},
 	}
 	out := resolveAnycastTargets(allocs, readyOnly("pod-a"), nil)
 	tgt := out["2001:db8:a::1"]
 	if len(tgt.nexthops) != 1 {
 		t.Fatalf("expected 1 nexthop after NotReady drop, got %d", len(tgt.nexthops))
 	}
 	if !tgt.nexthops[0].via.Equal(net.ParseIP("2001:db8::1")) {
 		t.Fatalf("wrong surviving nexthop: %v", tgt.nexthops[0].via)
 	}
 }
 // Pods that haven't reached Ready are excluded entirely from the target
 // set. If no pod is Ready for an anycast IP, that IP is absent from the
 // output (BIRD will withdraw from BGP, kernel route will be removed).
 func TestResolveAnycastTargets_NoReadyPodsOmitsIP(t *testing.T) {
 	allocs := []Allocation{
 		{ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 			State: StateCommitted, IP6: "2001:db8::1",
 			Anycast: []string{"2001:db8:a::1"}},
 	}
 	out := resolveAnycastTargets(allocs, readyOnly( /* none */ ), nil)
 	if _, ok := out["2001:db8:a::1"]; ok {
 		t.Fatalf("anycast should be absent when no pod ready")
 	}
 }
 // Pending allocations (CNI ADD partway through) are skipped even if the
 // pod is Ready — we don't program kernel routes for partial setups.
 func TestResolveAnycastTargets_PendingSkipped(t *testing.T) {
 	allocs := []Allocation{
 		{ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 			State: StatePending, IP6: "2001:db8::1",
 			Anycast: []string{"2001:db8:a::1"}},
 	}
 	out := resolveAnycastTargets(allocs, allReady, nil)
 	if len(out) != 0 {
 		t.Fatalf("pending allocations must be skipped")
 	}
 }
 // Mixed v6+v4 anycast on the same pod produces two separate target
 // entries, one per family, each anchored on the matching unicast IP.
 func TestResolveAnycastTargets_MixedFamilies(t *testing.T) {
 	allocs := []Allocation{{
 		ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 		State:       StateCommitted,
 		IP6:         "2001:db8::1",
 		IP4:         "10.0.0.1",
 		Anycast:     []string{"2001:db8:a::1", "10.255.0.1"},
 	}}
 	out := resolveAnycastTargets(allocs, allReady, nil)
 	if !out["2001:db8:a::1"].nexthops[0].via.Equal(net.ParseIP("2001:db8::1")) {
 		t.Fatalf("v6 anycast should resolve via v6 unicast")
 	}
 	if !out["10.255.0.1"].nexthops[0].via.Equal(net.ParseIP("10.0.0.1").To4()) {
 		t.Fatalf("v4 anycast should resolve via v4 unicast")
 	}
 }
 // An anycast whose family has no matching unicast on the pod is skipped
 // with a warning. Other anycast IPs on the same pod are unaffected.
 func TestResolveAnycastTargets_FamilyMismatchWarns(t *testing.T) {
 	allocs := []Allocation{{
 		ContainerID: "c1", Namespace: "ns", PodName: "pod-a",
 		State:       StateCommitted,
 		IP6:         "2001:db8::1", // v6 only
 		Anycast:     []string{"2001:db8:a::1", "10.255.0.1"},
 	}}
 	var warns []string
 	out := resolveAnycastTargets(allocs, allReady, func(s string) { warns = append(warns, s) })
 	if _, has := out["2001:db8:a::1"]; !has {
 		t.Fatalf("v6 anycast should have been programmed")
 	}
 	if _, has := out["10.255.0.1"]; has {
 		t.Fatalf("v4 anycast should have been skipped")
 	}
 	if len(warns) != 1 {
 		t.Fatalf("expected 1 warning, got %d: %v", len(warns), warns)
 	}
 	if !strings.Contains(warns[0], "10.255.0.1") {
 		t.Fatalf("warning should mention skipped IP: %q", warns[0])
 	}
 }
 // Determinism: the same input must produce nexthops in the same order.
 func TestResolveAnycastTargets_Determinism(t *testing.T) {
 	allocs := []Allocation{
 		{ContainerID: "z-late", Namespace: "ns", PodName: "z",
 			State: StateCommitted, IP6: "2001:db8::5",
 			Anycast: []string{"2001:db8:a::1"}},
 		{ContainerID: "a-early", Namespace: "ns", PodName: "a",
 			State: StateCommitted, IP6: "2001:db8::3",
 			Anycast: []string{"2001:db8:a::1"}},
 		{ContainerID: "m-mid", Namespace: "ns", PodName: "m",
 			State: StateCommitted, IP6: "2001:db8::4",
 			Anycast: []string{"2001:db8:a::1"}},
 	}
 	a := resolveAnycastTargets(allocs, allReady, nil)
 	b := resolveAnycastTargets(allocs, allReady, nil)
 	if !a["2001:db8:a::1"].equal(b["2001:db8:a::1"]) {
 		t.Fatalf("same input produced unequal targets")
 	}
 	// Sorted by canonical(via): ::3, ::4, ::5
 	via := a["2001:db8:a::1"].nexthops
 	if !via[0].via.Equal(net.ParseIP("2001:db8::3")) ||
 		!via[1].via.Equal(net.ParseIP("2001:db8::4")) ||
 		!via[2].via.Equal(net.ParseIP("2001:db8::5")) {
 		t.Fatalf("nexthops not stably sorted: %v %v %v", via[0].via, via[1].via, via[2].via)
 	}
 }
 // equal()'s contract — different orderings are still considered equal
 // AS LONG AS both sides have been canonicalised by resolveAnycastTargets.
 // Across-call comparisons of resolver outputs must always match for the
 // same logical input.
 func TestAnycastTarget_Equal(t *testing.T) {
 	a := anycastTarget{nexthops: []anycastNexthop{
 		{hostIface: "f1", via: net.ParseIP("2001:db8::1")},
 		{hostIface: "f2", via: net.ParseIP("2001:db8::2")},
 	}}
 	b := anycastTarget{nexthops: []anycastNexthop{
 		{hostIface: "f1", via: net.ParseIP("2001:db8::1")},
 		{hostIface: "f2", via: net.ParseIP("2001:db8::2")},
 	}}
 	if !a.equal(b) {
 		t.Fatalf("equal targets reported unequal")
 	}
 	c := anycastTarget{nexthops: []anycastNexthop{
 		{hostIface: "f1", via: net.ParseIP("2001:db8::1")},
 	}}
 	if a.equal(c) {
 		t.Fatalf("targets with different lengths reported equal")
 	}
 	d := anycastTarget{nexthops: []anycastNexthop{
 		{hostIface: "f1", via: net.ParseIP("2001:db8::1")},
 		{hostIface: "f2", via: net.ParseIP("2001:db8::3")}, // diff IP
 	}}
 	if a.equal(d) {
 		t.Fatalf("targets with different vias reported equal")
 	}
 }
@@ -55,6 +55,12 @@ func (b *BirdManager) Render(nc *flockv1alpha1.NodeConfig, anycast6, anycast4 []
 	// the BGP peer. crt001 rejects IPv6 advertisements whose next-hop is
 	// link-local-only; an explicit `source address` makes BIRD use a
 	// global next-hop self, which Cisco accepts.
 	//
 	// Also derive the connected subnet (peer IP masked to /64 v6 / /24 v4)
 	// per family. Render uses it to install `import where net != <subnet>`
 	// on the BGP channel so the gateway can't readvertise our own connected
 	// /64 back to us — accepting it would override the kernel route and
 	// hairpin all inter-host traffic via the gateway.
 	for _, p := range nc.Spec.BGP.Peers {
 		fam := bird.FamilyOf(p.Address)
 		if fam == "" {
@@ -69,6 +75,14 @@ func (b *BirdManager) Render(nc *flockv1alpha1.NodeConfig, anycast6, anycast4 []
 				in.LocalV4 = local
 			}
 		}
 		if subnet := peerSubnet(p.Address); subnet != "" {
 			if fam == "v6" && in.LocalSubnetV6 == "" {
 				in.LocalSubnetV6 = subnet
 			}
 			if fam == "v4" && in.LocalSubnetV4 == "" {
 				in.LocalSubnetV4 = subnet
 			}
 		}
 	}
 	cfg, err := bird.Render(in)
@@ -165,6 +179,25 @@ func (b *BirdManager) SummaryRoutes(nc *flockv1alpha1.NodeConfig) error {
 	return nil
 }
 // peerSubnet returns the canonical CIDR of the assumed connected subnet
 // containing `peer` — /64 for IPv6, /24 for IPv4. Returns "" if peer
 // doesn't parse. Matches the assumption already baked into
 // localAddrSameSubnet: fritzlab convention is /64 v6 and /24 v4.
 func peerSubnet(peer string) string {
 	pip := net.ParseIP(peer)
 	if pip == nil {
 		return ""
 	}
 	var mask net.IPMask
 	if pip.To4() != nil {
 		mask = net.CIDRMask(24, 32)
 	} else {
 		mask = net.CIDRMask(64, 128)
 	}
 	n := &net.IPNet{IP: pip.Mask(mask), Mask: mask}
 	return n.String()
 }
 // localAddrSameSubnet finds an IP on a local interface that's in the same
 // /64 (v6) or /24 (v4) as `peer`. Returns "" if none. Used to derive the
 // `source address` for a BGP session.
@@ -0,0 +1,25 @@
 package agent
 import "testing"
 func TestPeerSubnet(t *testing.T) {
 	cases := []struct {
 		peer string
 		want string
 	}{
 		{"2602:817:3000:a25::1", "2602:817:3000:a25::/64"},
 		{"2602:817:3000:a25::104", "2602:817:3000:a25::/64"},
 		{"172.25.25.1", "172.25.25.0/24"},
 		{"172.25.25.104", "172.25.25.0/24"},
 		{"", ""},
 		{"not-an-ip", ""},
 	}
 	for _, tc := range cases {
 		t.Run(tc.peer, func(t *testing.T) {
 			got := peerSubnet(tc.peer)
 			if got != tc.want {
 				t.Fatalf("peerSubnet(%q) = %q, want %q", tc.peer, got, tc.want)
 			}
 		})
 	}
 }
@@ -0,0 +1,22 @@
 // Package agent owns the in-process flock-agent runtime. The agent is a
 // single Linux DaemonSet pod per node and holds:
 //
 //   - the durable per-node allocation file at /var/lib/flock/allocations.json
 //     (see Store in state.go),
 //   - an in-memory IPAM seeded from NodeConfig CIDRs and reconciled against
 //     the allocation file at startup (see ipam.go),
 //   - dynamic informers watching the per-node NodeConfig CR (nodeconfig.go)
 //     and the local-node Pod set (podinfo.go),
 //   - an RPC server speaking to the lightweight CNI plugin binary
 //     (cmd/flock and pkg/cni), so kubelet's CNI invocations are answered by
 //     a long-lived process rather than spinning up a fresh binary per ADD,
 //   - the BirdManager that renders bird.conf and triggers `birdc reload`
 //     on changes (bird.go), and
 //   - the AnycastReconciler that programs per-pod /128 and /32 host routes
 //     gated on Pod readiness (anycast_linux.go).
 //
 // The package is split between platform-specific files (anycast_linux.go,
 // netns_linux.go, runtime_linux.go) and stub files used on non-Linux build
 // hosts so the rest of the package — IPAM, parsing, store, RPC plumbing —
 // stays unit-testable on macOS and Windows CI.
 package agent
@@ -3,14 +3,91 @@ package agent
 import (
 	"context"
 	"fmt"
 	"log/slog"
 	"net"
 	"strings"
 	"time"
 	flockcni "code.fritzlab.net/fritzlab/flock/pkg/cni"
 	cnitypes "github.com/containernetworking/cni/pkg/types"
 	current "github.com/containernetworking/cni/pkg/types/100"
 	corev1 "k8s.io/api/core/v1"
 )
 // podTemplateHashLabel is the well-known label Kubernetes attaches to
 // every Pod owned by a ReplicaSet so the ReplicaSet name can be
 // reconstructed as "<deploy>-<hash>". We use it to peel the hash back off
 // in deriveAppName.
 const podTemplateHashLabel = "pod-template-hash"
 // deriveAppName returns the stable workload identifier for a Pod — the
 // name of the topmost stable controller, with the pod-template-hash
 // stripped for ReplicaSet-owned pods.
 //
 // The rule maps to Kubernetes pod-name generation:
 //
 //	Deployment → ReplicaSet → Pod   pod owner is RS named "<deploy>-<hash>";
 //	                                 strip the trailing "-<hash>" to recover
 //	                                 the Deployment name.
 //	StatefulSet → Pod                pod owner is the STS itself; use as-is.
 //	DaemonSet  → Pod                 pod owner is the DS itself; use as-is.
 //	Job        → Pod                 pod owner is the Job itself; use as-is.
 //	(bare pod) → Pod                 no controller owner; fall back to pod name.
 //
 // All replicas of the same workload converge on the same return value,
 // which is the property the ip-algo `app` field needs.
 func deriveAppName(pod *corev1.Pod) string {
 	owner := controllerOwner(pod)
 	if owner == nil {
 		return pod.Name
 	}
 	if owner.Kind == "ReplicaSet" {
 		if hash, ok := pod.Labels[podTemplateHashLabel]; ok && hash != "" {
 			suffix := "-" + hash
 			if strings.HasSuffix(owner.Name, suffix) {
 				return strings.TrimSuffix(owner.Name, suffix)
 			}
 		}
 		// Custom controller named the RS something that doesn't match
 		// the pod-template-hash convention. Falling back to the RS name
 		// keeps replicas of the same RS aligned, which is the second-
 		// best correctness we can offer.
 		return owner.Name
 	}
 	return owner.Name
 }
 // controllerOwner returns the OwnerReference flagged with Controller=true,
 // or nil if none. Kubernetes guarantees at most one controller per object.
 func controllerOwner(pod *corev1.Pod) *metav1OwnerLite {
 	for i := range pod.OwnerReferences {
 		o := &pod.OwnerReferences[i]
 		if o.Controller != nil && *o.Controller {
 			return &metav1OwnerLite{Kind: o.Kind, Name: o.Name}
 		}
 	}
 	return nil
 }
 // metav1OwnerLite is the slice of OwnerReference we actually consult,
 // kept tiny so it can be returned by value-pointer cheaply.
 type metav1OwnerLite struct {
 	Kind string
 	Name string
 }
 // podImageRef returns a deterministic image reference for the embed
 // `image` field. We use the first container's spec'd image — this is
 // stable across replicas of the same Deployment without requiring the
 // runtime-resolved digest. Empty string if the pod has no containers,
 // in which case the embed package falls back to FNV(containerID).
 func podImageRef(pod *corev1.Pod) string {
 	if len(pod.Spec.Containers) == 0 {
 		return ""
 	}
 	return pod.Spec.Containers[0].Image
 }
 // PodHandler is the platform-agnostic ADD/DEL/CHECK implementation. It
 // resolves the Pod from the informer cache, parses annotations, allocates
 // from IPAM, programs netns (or skips on non-Linux build), and persists
@@ -22,6 +99,7 @@ type PodHandler struct {
 	IPAM       *IPAM
 	Pods       *PodCache
 	NodeConfig *NodeConfigCache
 	Logger     *slog.Logger
 	// SetupFunc and TeardownFunc are injected at startup; in production
 	// they point at the Linux netlink ops, in tests they're fakes.
 	SetupFunc    func(SetupRequest) error
@@ -49,25 +127,58 @@ func (h *PodHandler) Add(ctx context.Context, req flockcni.Request) (*current.Re
 		return nil, fmt.Errorf("lookup pod: %w", err)
 	}
-	parsed, err := ParseAnnotations(pod.Annotations)
+	nc := h.NodeConfig.Load()
 	defaults := FamilyDefaultsFromNodeConfig(nc)
 	parsed, err := ParseAnnotations(pod.Annotations, defaults)
 	if err != nil {
 		return nil, fmt.Errorf("parse annotations: %w", err)
 	}
 	var nodeAnn map[string]string
 	if nc != nil {
 		nodeAnn = nc.GetAnnotations()
 	}
 	ipAlgo := ResolveIPAlgo(pod.Annotations, nodeAnn, h.Logger)
 	// addresses-annotation IPs replace IPAM allocation for any family they
 	// cover. Plex needs its public IPv4 to be the pod's primary v4 (default
 	// route source, on-link host route, /32 in BGP) — not just an extra IP
 	// layered on top of a private IPAM allocation. Peel one v6 + one v4 out
 	// of Addresses to use as the pod's primary IPs; anything beyond that
 	// stays in addrExtras and gets the existing layered behavior.
 	addrV6, addrV4, addrExtras := splitAddressesPrimary(parsed.Addresses)
 	allocReq := AllocRequest{
 		ContainerID: req.ContainerID,
 		Namespace:   args.PodNamespace,
 		Pod:         args.PodName,
-		WantV6:      parsed.WantV6,
+		App:         deriveAppName(pod),
-		WantV4:      parsed.WantV4,
+		WantV6:      parsed.WantV6 && addrV6 == nil,
 		WantV4:      parsed.WantV4 && addrV4 == nil,
 		AnnCIDR6:    parsed.CIDR6,
 		AnnCIDR4:    parsed.CIDR4,
-		IPAlgo:      parsed.IPAlgo,
+		IPAlgo:      ipAlgo,
 		Image:       podImageRef(pod),
 	}
-	res, err := h.IPAM.Allocate(allocReq)
+	var res AllocResult
 	if allocReq.WantV6 || allocReq.WantV4 {
 		var err error
 		res, err = h.IPAM.Allocate(allocReq)
 		if err != nil {
 			return nil, fmt.Errorf("ipam: %w", err)
 		}
 	}
 	// Promote the peeled addresses IPs into the primary slots. They get the
 	// IPAM-style routing path: bound to eth0 in configurePodSide, default
 	// route via fe80::1 / v4ProxyGW, on-link host route via setHostRoute.
 	// BGP advertisement of the /32/128 is handled by the AnycastReconciler
 	// via renderBird's outside-aggregate detection.
 	if addrV6 != nil {
 		res.IP6 = addrV6
 	}
 	if addrV4 != nil {
 		res.IP4 = addrV4
 	}
 	// Persist pending entry before any netlink work so a crash mid-ADD
 	// leaves recoverable state.
@@ -79,6 +190,7 @@ func (h *PodHandler) Add(ctx context.Context, req flockcni.Request) (*current.Re
 		IP6:         ipString(res.IP6),
 		IP4:         ipString(res.IP4),
 		Anycast:     anycastStrings(parsed.Anycast),
 		Addresses:   anycastStrings(addrExtras),
 		State:       StatePending,
 		AllocatedAt: time.Now().UTC(),
 	}
@@ -95,6 +207,7 @@ func (h *PodHandler) Add(ctx context.Context, req flockcni.Request) (*current.Re
 		IP6:         res.IP6,
 		IP4:         res.IP4,
 		Anycast:     parsed.Anycast,
 		Addresses:   addrExtras,
 	}
 	if err := h.SetupFunc(setup); err != nil {
 		// Roll forward: leave pending entry in place so startup GC can clean
@@ -164,6 +277,11 @@ func resultFromAllocation(ifName string, a Allocation) *current.Result {
 			Address:   net.IPNet{IP: ip4, Mask: net.CIDRMask(32, 32)},
 		})
 	}
 	// Addresses IPs are intentionally excluded from the CNI result.
 	// Kubernetes limits pod.status.podIPs to one IPv4 + one IPv6; any
 	// additional IPs returned here are silently dropped by kubelet. The
 	// addresses IPs are visible inside the pod on eth0 and advertised via
 	// BGP — that is sufficient for workload use.
 	return r
 }
@@ -175,6 +293,33 @@ func ipString(ip net.IP) string {
 	return canonical(ip)
 }
 // splitAddressesPrimary peels off the first IPv6 and first IPv4 from the
 // addresses list to use as the pod's primary IPs in place of an IPAM
 // allocation. The remaining entries (anything beyond the first of each
 // family) stay in extras for the existing layered eth0 binding via the
 // AnycastReconciler's via-route path.
 //
 // Order of the input is preserved in extras. Either of v6/v4 may be nil
 // when the addresses list contains no IP of that family — the caller falls
 // back to IPAM allocation in that case.
 func splitAddressesPrimary(ips []net.IP) (v6, v4 net.IP, extras []net.IP) {
 	for _, ip := range ips {
 		if ip.To4() != nil {
 			if v4 == nil {
 				v4 = ip.To4()
 				continue
 			}
 		} else {
 			if v6 == nil {
 				v6 = ip.To16()
 				continue
 			}
 		}
 		extras = append(extras, ip)
 	}
 	return
 }
 func anycastStrings(ips []net.IP) []string {
 	if len(ips) == 0 {
 		return nil
@@ -0,0 +1,186 @@
 package agent
 import (
 	"net"
 	"testing"
 	corev1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )
 func ptrBool(b bool) *bool { return &b }
 func mkPod(name string, owner *metav1.OwnerReference, labels map[string]string, image string) *corev1.Pod {
 	p := &corev1.Pod{
 		ObjectMeta: metav1.ObjectMeta{Name: name, Labels: labels},
 	}
 	if owner != nil {
 		p.OwnerReferences = []metav1.OwnerReference{*owner}
 	}
 	if image != "" {
 		p.Spec.Containers = []corev1.Container{{Image: image}}
 	}
 	return p
 }
 func TestDeriveAppName_DeploymentReplicaSet(t *testing.T) {
 	owner := &metav1.OwnerReference{
 		Kind:       "ReplicaSet",
 		Name:       "traefik-789df685f",
 		Controller: ptrBool(true),
 	}
 	pod := mkPod("traefik-789df685f-hqvfl", owner,
 		map[string]string{podTemplateHashLabel: "789df685f"}, "")
 	if got := deriveAppName(pod); got != "traefik" {
 		t.Fatalf("got %q, want %q", got, "traefik")
 	}
 }
 func TestDeriveAppName_StatefulSet(t *testing.T) {
 	owner := &metav1.OwnerReference{
 		Kind:       "StatefulSet",
 		Name:       "gitea",
 		Controller: ptrBool(true),
 	}
 	pod := mkPod("gitea-0", owner, nil, "")
 	if got := deriveAppName(pod); got != "gitea" {
 		t.Fatalf("got %q, want %q", got, "gitea")
 	}
 }
 func TestDeriveAppName_DaemonSet(t *testing.T) {
 	owner := &metav1.OwnerReference{
 		Kind:       "DaemonSet",
 		Name:       "flock-agent",
 		Controller: ptrBool(true),
 	}
 	pod := mkPod("flock-agent-abcde", owner, nil, "")
 	if got := deriveAppName(pod); got != "flock-agent" {
 		t.Fatalf("got %q, want %q", got, "flock-agent")
 	}
 }
 func TestDeriveAppName_BarePod(t *testing.T) {
 	pod := mkPod("standalone", nil, nil, "")
 	if got := deriveAppName(pod); got != "standalone" {
 		t.Fatalf("got %q, want %q", got, "standalone")
 	}
 }
 // TestDeriveAppName_RSWithoutTemplateHash — ReplicaSet owners that don't
 // follow the standard "<deploy>-<hash>" naming convention (e.g. a custom
 // controller) keep the RS name as-is. All replicas of that RS still align,
 // which is the second-best correctness offer.
 func TestDeriveAppName_RSWithoutTemplateHash(t *testing.T) {
 	owner := &metav1.OwnerReference{
 		Kind:       "ReplicaSet",
 		Name:       "weird-rs-name",
 		Controller: ptrBool(true),
 	}
 	pod := mkPod("weird-rs-name-xyz", owner, nil, "")
 	if got := deriveAppName(pod); got != "weird-rs-name" {
 		t.Fatalf("got %q, want %q", got, "weird-rs-name")
 	}
 }
 func TestDeriveAppName_NonControllerOwnerIgnored(t *testing.T) {
 	// OwnerReference without Controller=true must be ignored — only the
 	// controller owner is the canonical workload.
 	owner := &metav1.OwnerReference{
 		Kind: "Foo",
 		Name: "irrelevant",
 		// Controller pointer left nil.
 	}
 	pod := mkPod("solo", owner, nil, "")
 	if got := deriveAppName(pod); got != "solo" {
 		t.Fatalf("got %q, want %q", got, "solo")
 	}
 }
 func TestPodImageRef(t *testing.T) {
 	pod := mkPod("p", nil, nil, "traefik:v3.5")
 	if got := podImageRef(pod); got != "traefik:v3.5" {
 		t.Fatalf("got %q, want %q", got, "traefik:v3.5")
 	}
 	empty := mkPod("p", nil, nil, "")
 	if got := podImageRef(empty); got != "" {
 		t.Fatalf("got %q, want \"\"", got)
 	}
 }
 func TestSplitAddressesPrimary_BothFamilies(t *testing.T) {
 	// Plex pattern: one v6 + one v4 → both peel out, no extras.
 	ips := []net.IP{
 		net.ParseIP("2602:817:3000:c606::166"),
 		net.ParseIP("142.202.202.166"),
 	}
 	v6, v4, extras := splitAddressesPrimary(ips)
 	if v6 == nil || v6.String() != "2602:817:3000:c606::166" {
 		t.Fatalf("v6 = %v", v6)
 	}
 	if v4 == nil || v4.String() != "142.202.202.166" {
 		t.Fatalf("v4 = %v", v4)
 	}
 	if len(extras) != 0 {
 		t.Fatalf("extras = %v, want empty", extras)
 	}
 }
 func TestSplitAddressesPrimary_OnlyV4(t *testing.T) {
 	v6, v4, extras := splitAddressesPrimary([]net.IP{net.ParseIP("142.202.202.166")})
 	if v6 != nil {
 		t.Fatalf("v6 should be nil, got %v", v6)
 	}
 	if v4 == nil || v4.String() != "142.202.202.166" {
 		t.Fatalf("v4 = %v", v4)
 	}
 	if len(extras) != 0 {
 		t.Fatalf("extras = %v", extras)
 	}
 }
 func TestSplitAddressesPrimary_OnlyV6(t *testing.T) {
 	v6, v4, extras := splitAddressesPrimary([]net.IP{net.ParseIP("2602:817:3000:c606::166")})
 	if v4 != nil {
 		t.Fatalf("v4 should be nil, got %v", v4)
 	}
 	if v6 == nil || v6.String() != "2602:817:3000:c606::166" {
 		t.Fatalf("v6 = %v", v6)
 	}
 	if len(extras) != 0 {
 		t.Fatalf("extras = %v", extras)
 	}
 }
 func TestSplitAddressesPrimary_Empty(t *testing.T) {
 	v6, v4, extras := splitAddressesPrimary(nil)
 	if v6 != nil || v4 != nil || extras != nil {
 		t.Fatalf("nil input should yield nil outputs, got v6=%v v4=%v extras=%v", v6, v4, extras)
 	}
 }
 func TestSplitAddressesPrimary_Extras(t *testing.T) {
 	// Multiple v4s — only the first peels into the primary slot; the rest
 	// stay in extras for layered-eth0 binding via the AnycastReconciler.
 	// (Not a current production use case, but the code should handle it
 	// without dropping IPs.)
 	ips := []net.IP{
 		net.ParseIP("142.202.202.166"),
 		net.ParseIP("2602:817:3000:c606::166"),
 		net.ParseIP("142.202.202.167"),
 		net.ParseIP("2602:817:3000:c606::167"),
 	}
 	v6, v4, extras := splitAddressesPrimary(ips)
 	if v4.String() != "142.202.202.166" {
 		t.Fatalf("v4 primary = %v, want 142.202.202.166", v4)
 	}
 	if v6.String() != "2602:817:3000:c606::166" {
 		t.Fatalf("v6 primary = %v, want 2602:817:3000:c606::166", v6)
 	}
 	if len(extras) != 2 {
 		t.Fatalf("extras len = %d, want 2", len(extras))
 	}
 	if extras[0].String() != "142.202.202.167" || extras[1].String() != "2602:817:3000:c606::167" {
 		t.Fatalf("extras order/content wrong: %v", extras)
 	}
 }
@@ -0,0 +1,63 @@
 package agent
 import (
 	"strings"
 	"testing"
 )
 func TestHostIfaceName_Format(t *testing.T) {
 	got := HostIfaceName("0123456789abcdef0123456789abcdef")
 	if !strings.HasPrefix(got, "flock") || len(got) != len("flock")+8 {
 		t.Fatalf("HostIfaceName=%q (want flock + 8 hex)", got)
 	}
 }
 func TestHostIfaceName_Determinism(t *testing.T) {
 	a := HostIfaceName("container-xyz")
 	b := HostIfaceName("container-xyz")
 	if a != b {
 		t.Fatalf("not deterministic: %s vs %s", a, b)
 	}
 }
 func TestHostIfaceName_DifferentInputs(t *testing.T) {
 	a := HostIfaceName("a")
 	b := HostIfaceName("b")
 	if a == b {
 		t.Fatalf("collision on trivial inputs")
 	}
 }
 // FuzzHostIfaceName ensures the host interface name generator never produces
 // an output longer than IFNAMSIZ-1 (15 chars on Linux) and never panics.
 // The name format is "flock" + 8 hex chars = 13 chars, always.
 func FuzzHostIfaceName(f *testing.F) {
 	f.Add("")
 	f.Add("a")
 	f.Add("/var/run/netns/abc")
 	f.Add("0123456789abcdef0123456789abcdef")
 	f.Add(longString("x", 64*1024)) // very long containerID
 	f.Add("\x00\x00\x00")
 	f.Add("ünïcødé/контейнер")
 	f.Fuzz(func(t *testing.T, id string) {
 		got := HostIfaceName(id)
 		// Linux IFNAMSIZ is 16 (15 chars + NUL); ours must fit comfortably.
 		if len(got) > 15 {
 			t.Fatalf("HostIfaceName(%q)=%q exceeds 15 chars", id, got)
 		}
 		if !strings.HasPrefix(got, "flock") {
 			t.Fatalf("HostIfaceName(%q)=%q missing prefix", id, got)
 		}
 		// Suffix must be lowercase hex (8 chars).
 		suffix := got[len("flock"):]
 		if len(suffix) != 8 {
 			t.Fatalf("HostIfaceName(%q) suffix len=%d", id, len(suffix))
 		}
 		for _, c := range suffix {
 			if !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f')) {
 				t.Fatalf("HostIfaceName(%q)=%q has non-hex suffix", id, got)
 			}
 		}
 	})
 }
@@ -62,25 +62,36 @@ func (cryptoRand) PickIndex(n int) int {
 }
 // AllocRequest describes a pending allocation. Values come from Pod metadata
-// + annotations at CNI ADD time.
+// + annotations at CNI ADD time, with per-node FamilyDefaults already merged
 // in (see ParseAnnotations).
 type AllocRequest struct {
 	ContainerID string
 	Namespace   string
 	// Pod is the literal pod name (used for logging only — not embedded).
 	Pod string
-	// WantV6 / WantV4 come from the ipv6 / ipv4 annotations (defaults in
+	// App is the stable workload identity for the FieldApp embed field —
-	// design doc: ipv6=true, ipv4=false).
+	// typically the owning Deployment / StatefulSet / DaemonSet name.
 	// Computed by the handler; falls back to Pod when no usable owner is
 	// found (bare pods).
 	App string
 	// WantV6 / WantV4 are the post-merge address family selection (pod
 	// annotation > NodeConfig.Spec.Defaults > built-in baseline of
 	// dual-stack). At least one MUST be true; Allocate rejects the request
 	// otherwise.
 	WantV6 bool
 	WantV4 bool
 	// AnnCIDR6 / AnnCIDR4 come from the cidr6 / cidr4 annotations. Empty
 	// means "use any of the node's CIDRs".
 	AnnCIDR6 []*net.IPNet
 	AnnCIDR4 []*net.IPNet
-	// IPAlgo comes from the ip-algo annotation. Empty means random IID.
+	// IPAlgo comes from the resolved ip-algo precedence chain. Empty means
 	// random IID.
 	IPAlgo []embed.Field
-	// ImageDigest is the sha256 manifest digest (with or without "sha256:"
+	// Image is the spec'd image reference (typically
-	// prefix). If empty, embed.Values.ImageFallback = ContainerID is used
+	// pod.Spec.Containers[0].Image). When 64 hex chars, treated as a
-	// for ip-algo fields that reference image.
+	// sha256 digest; otherwise FNV-1a-64'd as a string. Empty falls back
-	ImageDigest string
+	// to FNV(ContainerID) for ip-algo fields that reference image.
 	Image string
 }
 // AllocResult is what the IPAM hands back to the CNI ADD.
@@ -207,8 +218,8 @@ func (i *IPAM) allocV6(cidr *net.IPNet, req AllocRequest) (net.IP, error) {
 		} else {
 			ip, err = embed.Embed(cidr, req.IPAlgo, embed.Values{
 				Namespace:     req.Namespace,
-				Pod:           req.Pod,
+				App:           req.App,
-				Image:         req.ImageDigest,
+				Image:         req.Image,
 				ImageFallback: req.ContainerID,
 			}, i.randSrc.NibbleN())
 		}
@@ -224,34 +235,36 @@ func (i *IPAM) allocV6(cidr *net.IPNet, req AllocRequest) (net.IP, error) {
 // randomV6 picks a random /128 inside cidr. The network prefix bits are
 // preserved from cidr.IP; the host bits are filled from the random source.
 //
 // Implementation: walk the 16 IPv6 bytes once. For each byte we ask whether
 // it's entirely inside the network mask (skip), entirely inside the host
 // portion (overwrite with random), or split (combine bits from both).
 func (i *IPAM) randomV6(cidr *net.IPNet) (net.IP, error) {
 	ones, bits := cidr.Mask.Size()
 	if bits != 128 {
 		return nil, fmt.Errorf("cidr %s is not IPv6", cidr)
 	}
-	out := make(net.IP, 16)
+	out := make(net.IP, net.IPv6len)
 	copy(out, cidr.IP.To16())
-	hostBits := 128 - ones
+	rnd := make([]byte, net.IPv6len)
 	rnd := make([]byte, 16)
 	i.randSrc.FillIID(rnd)
-	// Merge rnd into out where mask bit is 0.
+	for b := 0; b < net.IPv6len; b++ {
 	for b := 0; b < 16; b++ {
 		// Host bits start at bit index `ones`, byte `b`.
 		byteStart := b * 8
 		byteEnd := byteStart + 8
-		if byteEnd <= ones {
+		switch {
-			continue // entirely network
+		case byteEnd <= ones:
-		}
+			// Entirely inside the network prefix — leave untouched.
 		if byteStart >= ones {
 			out[b] = rnd[b] // entirely host
 			continue
-		}
+		case byteStart >= ones:
-		// Split byte: top (ones-byteStart) bits are network, rest is host.
+			// Entirely inside the host portion — fully randomise.
 			out[b] = rnd[b]
 		default:
 			// Split byte: top (ones-byteStart) bits are network, rest host.
 			networkBits := ones - byteStart
 			hostMask := byte(0xFF) >> uint(networkBits)
 			out[b] = (out[b] & ^hostMask) | (rnd[b] & hostMask)
 		}
-	_ = hostBits
+	}
 	return out, nil
 }
@@ -360,15 +373,34 @@ func toStringSlice(ns []*net.IPNet) []string {
 	return out
 }
 // canonical returns the textual form of ip in its native family, so the same
 // host address is always represented identically regardless of whether it
 // arrived as a 4-byte slice, a 16-byte v4-in-v6 slice, or a string-parsed
 // net.IP. Used as the key for the in-use map.
 //
 // Returns "" for nil input — callers MUST treat the returned key as opaque
 // and never use the empty string as a sentinel.
 func canonical(ip net.IP) string {
 	if ip == nil {
 		return ""
 	}
 	if v4 := ip.To4(); v4 != nil {
 		return v4.String()
 	}
-	return ip.To16().String()
+	if v16 := ip.To16(); v16 != nil {
 		return v16.String()
 	}
 	return ""
 }
 // ipToU32 reads a 4-byte IPv4 net.IP into a uint32. The caller is expected
 // to have already validated that ip is an IPv4 address; mis-use returns 0
 // rather than panicking.
 func ipToU32(ip net.IP) uint32 {
 	v4 := ip.To4()
 	if v4 == nil {
 		return 0
 	}
 	return uint32(v4[0])<<24 | uint32(v4[1])<<16 | uint32(v4[2])<<8 | uint32(v4[3])
 }
@@ -0,0 +1,169 @@
 package agent
 import (
 	"net"
 	"testing"
 )
 // FuzzIPAM_Allocate runs randomly-driven Allocate/Release sequences against
 // a /120 IPv6 + /28 IPv4 IPAM so the fuzzer can hit address exhaustion.
 //
 // Properties checked:
 //
 //  1. Allocate never panics regardless of the action stream.
 //  2. The set of in-use addresses never contains an address that has been
 //     released without a subsequent successful Allocate.
 //  3. A successful v6 allocation always yields an address inside the
 //     configured /120, and a successful v4 always inside the configured /28.
 //  4. ipToU32(canonical(allocated v4)) round-trips, and likewise that no
 //     v4 allocation lands on .0 (network) or .15 (broadcast) of the /28.
 //
 // The fuzzed bytes are interpreted as an opcode stream:
 //   - bytes[i] & 0x03 selects the action: 0=alloc-v6, 1=alloc-v4,
 //     2=alloc-dual, 3=release-most-recent.
 //   - bytes[i]>>2 is fed into the deterministic random source so different
 //     fuzzed bytes drive different IID/index choices.
 func FuzzIPAM_Allocate(f *testing.F) {
 	f.Add([]byte{0, 0, 0, 0})
 	f.Add([]byte{1, 1, 1, 1})
 	f.Add([]byte{2, 2, 2, 2})
 	f.Add([]byte{0, 1, 2, 3})
 	f.Add([]byte(longString("\x00\x01\x02\x03", 256)))
 	f.Fuzz(func(t *testing.T, ops []byte) {
 		ipam, err := NewIPAM(
 			[]string{"2001:db8::/120"}, // 256 host slots; 16 bytes of fuzzed nibbles
 			[]string{"10.0.0.0/28"},   // 14 usable hosts (.2..14)
 		)
 		if err != nil {
 			t.Fatal(err)
 		}
 		// Deterministic source: replay nibbles cycled from `ops`.
 		fr := &fakeRand{
 			nibbles: append([]byte{}, ops...),
 			iids: [][]byte{
 				// 16 bytes of "host portion" — only the last byte matters
 				// for a /120 prefix.
 				makeIID(ops, 0),
 				makeIID(ops, 1),
 				makeIID(ops, 2),
 				makeIID(ops, 3),
 			},
 		}
 		if len(fr.nibbles) == 0 {
 			fr.nibbles = []byte{0}
 		}
 		ipam.randSrc = fr
 		net6 := mustNet(t, "2001:db8::/120")
 		net4 := mustNet(t, "10.0.0.0/28")
 		var live []AllocResult
 		seen := map[string]struct{}{}
 		for idx, op := range ops {
 			req := AllocRequest{ContainerID: idStr(idx)}
 			switch op & 0x03 {
 			case 0:
 				req.WantV6 = true
 			case 1:
 				req.WantV4 = true
 			case 2:
 				req.WantV6, req.WantV4 = true, true
 			case 3:
 				if len(live) == 0 {
 					continue
 				}
 				rel := live[len(live)-1]
 				live = live[:len(live)-1]
 				ipam.Release(rel.IP6, rel.IP4)
 				delete(seen, canonical(rel.IP6))
 				delete(seen, canonical(rel.IP4))
 				continue
 			}
 			res, err := ipam.Allocate(req)
 			if err != nil {
 				continue // exhaustion is acceptable
 			}
 			if req.WantV6 {
 				if res.IP6 == nil {
 					t.Fatalf("requested v6 but got nil")
 				}
 				if !net6.Contains(res.IP6) {
 					t.Fatalf("v6 %s outside /120", res.IP6)
 				}
 				if _, dup := seen[canonical(res.IP6)]; dup {
 					t.Fatalf("v6 %s duplicated", res.IP6)
 				}
 				seen[canonical(res.IP6)] = struct{}{}
 			}
 			if req.WantV4 {
 				if res.IP4 == nil {
 					t.Fatalf("requested v4 but got nil")
 				}
 				if !net4.Contains(res.IP4) {
 					t.Fatalf("v4 %s outside /28", res.IP4)
 				}
 				v4 := res.IP4.To4()
 				if v4 == nil {
 					t.Fatalf("v4 result not 4-byte: %s", res.IP4)
 				}
 				// Skip .0 (network) and .15 (broadcast). The allocator
 				// should also skip .1 (gateway) by convention.
 				last := v4[3]
 				if last == 0 || last == 1 || last == 15 {
 					t.Fatalf("v4 %s in reserved range", res.IP4)
 				}
 				if _, dup := seen[canonical(res.IP4)]; dup {
 					t.Fatalf("v4 %s duplicated", res.IP4)
 				}
 				seen[canonical(res.IP4)] = struct{}{}
 			}
 			live = append(live, res)
 		}
 	})
 }
 // FuzzCanonical asserts that canonical never panics and is idempotent.
 func FuzzCanonical(f *testing.F) {
 	f.Add([]byte{})
 	f.Add([]byte{1, 2, 3, 4})
 	f.Add([]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0})
 	f.Add([]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0xff, 10, 0, 0, 1}) // v4-mapped v6
 	f.Add([]byte{0xff})
 	f.Fuzz(func(t *testing.T, b []byte) {
 		ip := net.IP(b)
 		s1 := canonical(ip)
 		// Idempotent: re-canonicalising the parsed form yields the same
 		// string for any non-empty result.
 		if s1 != "" {
 			parsed := net.ParseIP(s1)
 			if parsed == nil {
 				t.Fatalf("canonical(%v)=%q is not parseable as IP", b, s1)
 			}
 			if got := canonical(parsed); got != s1 {
 				t.Fatalf("not idempotent: %q -> %q", s1, got)
 			}
 		}
 	})
 }
 func makeIID(seed []byte, salt byte) []byte {
 	out := make([]byte, net.IPv6len)
 	for i := range out {
 		if i < len(seed) {
 			out[i] = seed[i] ^ salt
 		} else {
 			out[i] = salt
 		}
 	}
 	return out
 }
 func idStr(i int) string {
 	const hex = "0123456789abcdef"
 	return string([]byte{'c', '-', hex[(i>>4)&0xF], hex[i&0xF]})
 }
@@ -148,8 +148,8 @@ func TestIPAM_AllocV6_WithEmbed(t *testing.T) {
 	}
 	i.randSrc = &fakeRand{nibbles: []byte{0xe}}
 	res, err := i.Allocate(AllocRequest{
-		ContainerID: "c1", Namespace: "mail", Pod: "stalwart-0", WantV6: true,
+		ContainerID: "c1", Namespace: "mail", Pod: "stalwart-0", App: "stalwart", WantV6: true,
-		IPAlgo: []embed.Field{embed.FieldNamespace, embed.FieldPod, embed.FieldImage},
+		IPAlgo: []embed.Field{embed.FieldNamespace, embed.FieldApp, embed.FieldImage},
 	})
 	if err != nil {
 		t.Fatalf("Allocate: %v", err)
@@ -25,6 +25,11 @@ type SetupRequest struct {
 	// Host /128 and /32 routes are NOT installed here — that happens once
 	// the pod becomes Ready, see AnycastReconciler.
 	Anycast []net.IP
 	// Addresses are additional IPs to bind directly on pod eth0 (NOT lo).
 	// BGP advertisement is handled identically to Anycast by the
 	// AnycastReconciler. Use when the workload needs the IP on its primary
 	// interface (e.g. Plex remote-access detection).
 	Addresses []net.IP
 }
 // LinkLocalGW is the deterministic IPv6 LL gateway placed on every host
@@ -269,6 +274,23 @@ func configurePodSide(req SetupRequest) error {
 			}
 		}
 		// Addresses: assign directly to pod eth0. Host routing and BGP
 		// advertisement are handled identically to Anycast by the
 		// AnycastReconciler (host route via pod-eth0-ip, /128+/32 in BIRD).
 		for _, ip := range req.Addresses {
 			var mask net.IPMask
 			if ip.To4() != nil {
 				mask = net.CIDRMask(32, 32)
 				ip = ip.To4()
 			} else {
 				mask = net.CIDRMask(128, 128)
 			}
 			a := &netlink.Addr{IPNet: &net.IPNet{IP: ip, Mask: mask}, Scope: int(netlink.SCOPE_UNIVERSE)}
 			if err := netlink.AddrAdd(eth0, a); err != nil && !errors.Is(err, os.ErrExist) {
 				return fmt.Errorf("pod eth0 address %s: %w", ip, err)
 			}
 		}
 		return nil
 	})
 }
@@ -16,6 +16,7 @@ type SetupRequest struct {
 	IP6         net.IP
 	IP4         net.IP
 	Anycast     []net.IP
 	Addresses   []net.IP
 }
 // Setup is unimplemented on non-Linux platforms; the agent only runs in
@@ -0,0 +1,85 @@
 //go:build linux
 package netpol
 import (
 	"bytes"
 	"context"
 	"fmt"
 	"os/exec"
 	"time"
 )
 // Applier hands rendered nft scripts to the kernel via `nft -f -`.
 // nftables guarantees the entire script applies atomically — if any line
 // is rejected, the previous ruleset stays intact.
 //
 // Applier maintains the last-applied script string and skips the exec
 // when the new render is byte-identical, so a 5s reconcile tick on a
 // quiet cluster is cheap.
 type Applier struct {
 	// NftPath is the path to the nft binary. Empty means "look up `nft`
 	// on PATH". Tests set this to a fake.
 	NftPath string
 	// Timeout bounds an individual nft invocation; if zero, defaults to
 	// 5 seconds.
 	Timeout time.Duration
 	last string
 }
 // Apply runs `nft -f -` with the supplied script. Idempotent: if script
 // equals the last successful application, this is a no-op.
 //
 // Returns an error from nft (with stderr captured) if the script is
 // malformed or the kernel rejects it.
 func (a *Applier) Apply(ctx context.Context, script string) error {
 	if script == a.last {
 		return nil
 	}
 	timeout := a.Timeout
 	if timeout == 0 {
 		timeout = 5 * time.Second
 	}
 	bin := a.NftPath
 	if bin == "" {
 		bin = "nft"
 	}
 	cctx, cancel := context.WithTimeout(ctx, timeout)
 	defer cancel()
 	cmd := exec.CommandContext(cctx, bin, "-f", "-")
 	cmd.Stdin = bytes.NewBufferString(script)
 	var stderr bytes.Buffer
 	cmd.Stderr = &stderr
 	if err := cmd.Run(); err != nil {
 		return fmt.Errorf("nft -f -: %w: %s", err, stderr.String())
 	}
 	a.last = script
 	return nil
 }
 // Clear tears down the flock NetworkPolicy table — used by graceful
 // shutdown so a stopping agent doesn't leave stale enforcement behind.
 // Best-effort: if nft is missing or the table doesn't exist, returns
 // nil.
 func (a *Applier) Clear(ctx context.Context) error {
 	timeout := a.Timeout
 	if timeout == 0 {
 		timeout = 5 * time.Second
 	}
 	bin := a.NftPath
 	if bin == "" {
 		bin = "nft"
 	}
 	cctx, cancel := context.WithTimeout(ctx, timeout)
 	defer cancel()
 	cmd := exec.CommandContext(cctx, bin, "destroy", "table", "inet", "flock_netpol")
 	if err := cmd.Run(); err != nil {
 		// nft returns non-zero if the table doesn't exist — that's a
 		// success for our purposes.
 		return nil
 	}
 	a.last = ""
 	return nil
 }
@@ -0,0 +1,16 @@
 //go:build !linux
 package netpol
 import "context"
 // Applier is a no-op on non-Linux build hosts so unit tests run on macOS
 // without nft.
 type Applier struct {
 	NftPath string
 	Timeout interface{}
 	last    string
 }
 func (a *Applier) Apply(_ context.Context, script string) error { a.last = script; return nil }
 func (a *Applier) Clear(_ context.Context) error                { a.last = ""; return nil }
@@ -0,0 +1,250 @@
 package netpol
 import (
 	"net"
 	"strings"
 	"testing"
 	corev1 "k8s.io/api/core/v1"
 	netv1 "k8s.io/api/networking/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/util/intstr"
 )
 // These fixtures mirror the three NetworkPolicies live in the sjc001
 // cluster on 2026-04-25. They serve as integration-shaped tests: the
 // translator + renderer must produce a sensible nft script for each.
 //
 // Source of truth (refresh by running `kubectl get netpol -A -o yaml`):
 //
 //   - calico-apiserver/allow-apiserver
 //   - remote-proxies/lodge-home-assistant-ingress
 //   - storage/garage-admin-restrict
 // allowApiserverPolicy: TCP/5443 ingress to apiserver=true pods, no peer
 // restriction (allow-from-anywhere on that port).
 func allowApiserverPolicy() netv1.NetworkPolicy {
 	tcp := corev1.ProtocolTCP
 	port := intstr.FromInt32(5443)
 	return netv1.NetworkPolicy{
 		ObjectMeta: metav1.ObjectMeta{Namespace: "calico-apiserver", Name: "allow-apiserver"},
 		Spec: netv1.NetworkPolicySpec{
 			PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"apiserver": "true"}},
 			PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
 			Ingress: []netv1.NetworkPolicyIngressRule{{
 				Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &port}},
 			}},
 		},
 	}
 }
 // lodgeHomeAssistantPolicy: TCP/8080 from any pod in the `edge` namespace
 // to pods labelled app=lodge-home-assistant.
 func lodgeHomeAssistantPolicy() netv1.NetworkPolicy {
 	tcp := corev1.ProtocolTCP
 	port := intstr.FromInt32(8080)
 	return netv1.NetworkPolicy{
 		ObjectMeta: metav1.ObjectMeta{Namespace: "remote-proxies", Name: "lodge-home-assistant-ingress"},
 		Spec: netv1.NetworkPolicySpec{
 			PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"app": "lodge-home-assistant"}},
 			PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
 			Ingress: []netv1.NetworkPolicyIngressRule{{
 				From: []netv1.NetworkPolicyPeer{{
 					NamespaceSelector: &metav1.LabelSelector{
 						MatchLabels: map[string]string{"kubernetes.io/metadata.name": "edge"},
 					},
 				}},
 				Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &port}},
 			}},
 		},
 	}
 }
 // garageAdminPolicy: complex two-rule policy.
 //
 //  1. Allow TCP/{3900, 80, 3901} from anywhere.
 //  2. Allow TCP/3903 only from pods in `edge` or `storage`.
 func garageAdminPolicy() netv1.NetworkPolicy {
 	tcp := corev1.ProtocolTCP
 	p3900 := intstr.FromInt32(3900)
 	p80 := intstr.FromInt32(80)
 	p3901 := intstr.FromInt32(3901)
 	p3903 := intstr.FromInt32(3903)
 	return netv1.NetworkPolicy{
 		ObjectMeta: metav1.ObjectMeta{Namespace: "storage", Name: "garage-admin-restrict"},
 		Spec: netv1.NetworkPolicySpec{
 			PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"app": "garage"}},
 			PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
 			Ingress: []netv1.NetworkPolicyIngressRule{
 				{
 					Ports: []netv1.NetworkPolicyPort{
 						{Protocol: &tcp, Port: &p3900},
 						{Protocol: &tcp, Port: &p80},
 						{Protocol: &tcp, Port: &p3901},
 					},
 				},
 				{
 					From: []netv1.NetworkPolicyPeer{
 						{NamespaceSelector: &metav1.LabelSelector{
 							MatchLabels: map[string]string{"kubernetes.io/metadata.name": "edge"},
 						}},
 						{NamespaceSelector: &metav1.LabelSelector{
 							MatchLabels: map[string]string{"kubernetes.io/metadata.name": "storage"},
 						}},
 					},
 					Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &p3903}},
 				},
 			},
 		},
 	}
 }
 // TestClusterFixture_AllowApiserver — pod selected by the policy gets
 // isolated; the rendered script accepts TCP/5443 from anywhere.
 func TestClusterFixture_AllowApiserver(t *testing.T) {
 	pod := Pod{
 		Namespace: "calico-apiserver",
 		Name:      "calico-apiserver-1",
 		Labels:    map[string]string{"apiserver": "true"},
 		HostIface: "flock00000001",
 		IPs:       []net.IP{mustIP("2001:db8::1")},
 	}
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{pod},
 		Policies:  []netv1.NetworkPolicy{allowApiserverPolicy()},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	in, _ := isolationFor(out, "calico-apiserver/calico-apiserver-1")
 	if !in {
 		t.Fatalf("apiserver pod should be isolated for ingress")
 	}
 	script := Render(out)
 	if !strings.Contains(script, "tcp dport 5443 accept") {
 		t.Fatalf("expected TCP/5443 allow:\n%s", script)
 	}
 	// No peer filter — allow-all-on-port.
 	if strings.Contains(script, "ip6 saddr {") || strings.Contains(script, "ip saddr {") {
 		t.Fatalf("expected no peer filter for allow-from-anywhere:\n%s", script)
 	}
 }
 // TestClusterFixture_LodgeHomeAssistant — pod isolated; only TCP/8080
 // from edge namespace is allowed.
 func TestClusterFixture_LodgeHomeAssistant(t *testing.T) {
 	pod := Pod{
 		Namespace: "remote-proxies",
 		Name:      "lodge-home-assistant-0",
 		Labels:    map[string]string{"app": "lodge-home-assistant"},
 		HostIface: "flock00000002",
 		IPs:       []net.IP{mustIP("2001:db8::2")},
 	}
 	traefik := PeerPod{
 		Namespace: "edge", Name: "traefik-0",
 		Labels: map[string]string{"app": "traefik"},
 		IPs:    []net.IP{mustIP("2001:db8::aa")},
 	}
 	stranger := PeerPod{
 		Namespace: "default", Name: "random",
 		Labels: map[string]string{"app": "random"},
 		IPs:    []net.IP{mustIP("2001:db8::bb")},
 	}
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{pod},
 		PeerPods:  []PeerPod{traefik, stranger},
 		Namespaces: []Namespace{
 			{Name: "edge", Labels: map[string]string{"kubernetes.io/metadata.name": "edge"}},
 			{Name: "default", Labels: map[string]string{"kubernetes.io/metadata.name": "default"}},
 			{Name: "remote-proxies", Labels: map[string]string{"kubernetes.io/metadata.name": "remote-proxies"}},
 		},
 		Policies: []netv1.NetworkPolicy{lodgeHomeAssistantPolicy()},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(out.Rules) != 1 {
 		t.Fatalf("expected 1 rule, got %d", len(out.Rules))
 	}
 	r := out.Rules[0]
 	// Peer should be exactly traefik's IP, not stranger's.
 	got := map[string]bool{}
 	for _, c := range r.PeerCIDRs {
 		got[c.IP.String()] = true
 	}
 	if !got["2001:db8::aa"] {
 		t.Fatalf("traefik IP missing from rule: %v", got)
 	}
 	if got["2001:db8::bb"] {
 		t.Fatalf("stranger IP leaked into rule")
 	}
 	script := Render(out)
 	if !strings.Contains(script, "tcp dport 8080 accept") {
 		t.Fatalf("expected TCP/8080 allow:\n%s", script)
 	}
 }
 // TestClusterFixture_Garage — verifies the two-rule policy:
 //
 //  1. ports {3900, 80, 3901} accept from any peer
 //  2. port 3903 accept only from edge or storage namespaces
 func TestClusterFixture_Garage(t *testing.T) {
 	pod := Pod{
 		Namespace: "storage", Name: "garage-0",
 		Labels:    map[string]string{"app": "garage"},
 		HostIface: "flock00000003",
 		IPs:       []net.IP{mustIP("2001:db8::3")},
 	}
 	storagePeer := PeerPod{
 		Namespace: "storage", Name: "garage-1",
 		Labels: map[string]string{"app": "garage"},
 		IPs:    []net.IP{mustIP("2001:db8::31")},
 	}
 	edgePeer := PeerPod{
 		Namespace: "edge", Name: "traefik-0",
 		Labels: map[string]string{"app": "traefik"},
 		IPs:    []net.IP{mustIP("2001:db8::41")},
 	}
 	stranger := PeerPod{
 		Namespace: "default", Name: "random",
 		Labels: map[string]string{"app": "random"},
 		IPs:    []net.IP{mustIP("2001:db8::ff")},
 	}
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{pod},
 		PeerPods:  []PeerPod{storagePeer, edgePeer, stranger},
 		Namespaces: []Namespace{
 			{Name: "edge", Labels: map[string]string{"kubernetes.io/metadata.name": "edge"}},
 			{Name: "storage", Labels: map[string]string{"kubernetes.io/metadata.name": "storage"}},
 			{Name: "default", Labels: map[string]string{"kubernetes.io/metadata.name": "default"}},
 		},
 		Policies: []netv1.NetworkPolicy{garageAdminPolicy()},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	// Two ingress rules in the source policy → two Rules out (one per
 	// peer set, ports inline).
 	if len(out.Rules) != 2 {
 		t.Fatalf("expected 2 rules (one per ingress entry), got %d", len(out.Rules))
 	}
 	script := Render(out)
 	for _, want := range []string{
 		"tcp dport 3900 accept",
 		"tcp dport 80 accept",
 		"tcp dport 3901 accept",
 		"tcp dport 3903 accept",
 	} {
 		if !strings.Contains(script, want) {
 			t.Errorf("missing %q in script:\n%s", want, script)
 		}
 	}
 	// The 3903 rule must carry a peer filter for both edge and storage
 	// peer IPs but not the stranger.
 	if !strings.Contains(script, "2001:db8::31/128") || !strings.Contains(script, "2001:db8::41/128") {
 		t.Fatalf("expected edge+storage peer IPs in 3903 rule:\n%s", script)
 	}
 	if strings.Contains(script, "2001:db8::ff/128") {
 		t.Fatalf("stranger IP must not appear:\n%s", script)
 	}
 }
@@ -0,0 +1,44 @@
 // Package netpol implements Kubernetes NetworkPolicy enforcement for flock.
 //
 // # Model
 //
 // NetworkPolicy is a Kubernetes-native API (`networking.k8s.io/v1`) that
 // describes which pods may receive traffic (Ingress) and / or initiate
 // traffic (Egress). The semantics are isolation by selection: a pod that is
 // selected by *any* NetworkPolicy in a given direction becomes default-deny
 // in that direction, plus the union of all "allow" rules from every policy
 // that selects it. A pod selected by no policy is unrestricted.
 //
 // flock enforces these semantics with nftables. Each agent is responsible
 // for the pods scheduled on its own node — peer addresses (from
 // podSelector / namespaceSelector / ipBlock peers) come from a cluster-wide
 // informer set so the agent can resolve peers that live elsewhere.
 //
 // # Pipeline
 //
 // The work is split into four stages with hard boundaries between them so
 // each can be tested in isolation:
 //
 //  1. Informers (informers.go) — watch NetworkPolicies, Namespaces, and
 //     all Pods in the cluster. Maintain indices the translator can query.
 //
 //  2. Translator (translator.go) — pure function from
 //     (NetworkPolicy set, Namespace set, Pod set, local-node pod set) to
 //     []Rule. No I/O, no hidden state — straightforward to fuzz and unit
 //     test. Implements the default-deny semantics and the peer-resolution
 //     rules from the NetworkPolicy spec.
 //
 //  3. Renderer (render.go) — pure function from []Rule to an nft script
 //     (string). Output is deterministic so the apply stage can de-dupe.
 //
 //  4. Apply (apply_linux.go) — shell out to `nft -f -` for an atomic
 //     reconfiguration. nftables guarantees the whole script applies as a
 //     single transaction; partial failures roll back automatically.
 //
 // # Why nftables (and not eBPF)
 //
 // Atomic ruleset transactions, kernel-native, no userspace ebpf-loader to
 // maintain, and behaviour an operator can read directly with
 // `nft list ruleset`. The cost is that we walk per-pod chains in software,
 // which is fine at the cluster sizes flock targets.
 package netpol
@@ -0,0 +1,222 @@
 package netpol
 import (
 	"context"
 	"fmt"
 	"log/slog"
 	"net"
 	"sync"
 	"time"
 	corev1 "k8s.io/api/core/v1"
 	netv1 "k8s.io/api/networking/v1"
 	"k8s.io/client-go/informers"
 	"k8s.io/client-go/kubernetes"
 	"k8s.io/client-go/rest"
 	"k8s.io/client-go/tools/cache"
 )
 // World aggregates the cluster-wide caches the reconciler queries on
 // every pass: NetworkPolicies, Namespaces, and all Pods (for peer
 // resolution). Each field is safe for concurrent reads.
 type World struct {
 	logger *slog.Logger
 	mu       sync.RWMutex
 	policies map[string]netv1.NetworkPolicy // key = ns/name
 	namespaces map[string]Namespace
 	peerPods   map[string]PeerPod // key = ns/name
 	onChange []func()
 }
 // NewWorld returns an empty World. Callers should call Start to populate
 // it; before Start, the snapshot accessors return empty slices.
 func NewWorld(logger *slog.Logger) *World {
 	return &World{
 		logger:     logger,
 		policies:   map[string]netv1.NetworkPolicy{},
 		namespaces: map[string]Namespace{},
 		peerPods:   map[string]PeerPod{},
 	}
 }
 // OnChange registers a callback fired (synchronously, inside the informer
 // event handler) whenever any watched object changes. The reconciler
 // uses this to debounce policy reloads.
 func (w *World) OnChange(f func()) {
 	w.mu.Lock()
 	defer w.mu.Unlock()
 	w.onChange = append(w.onChange, f)
 }
 func (w *World) fireChange() {
 	w.mu.RLock()
 	cbs := append([]func(){}, w.onChange...)
 	w.mu.RUnlock()
 	for _, f := range cbs {
 		f()
 	}
 }
 // Start launches three informers (NetworkPolicy, Namespace, Pod) against
 // the cluster API. It blocks until each cache reports synced. The caller
 // is responsible for cancelling ctx on shutdown.
 func (w *World) Start(ctx context.Context, cfg *rest.Config) error {
 	cs, err := kubernetes.NewForConfig(cfg)
 	if err != nil {
 		return fmt.Errorf("kubernetes client: %w", err)
 	}
 	factory := informers.NewSharedInformerFactory(cs, 10*time.Minute)
 	npInformer := factory.Networking().V1().NetworkPolicies().Informer()
 	nsInformer := factory.Core().V1().Namespaces().Informer()
 	podInformer := factory.Core().V1().Pods().Informer()
 	if _, err := npInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
 		AddFunc:    func(obj interface{}) { w.onPolicy(obj, false) },
 		UpdateFunc: func(_, n interface{}) { w.onPolicy(n, false) },
 		DeleteFunc: func(obj interface{}) { w.onPolicy(obj, true) },
 	}); err != nil {
 		return fmt.Errorf("add netpol handler: %w", err)
 	}
 	if _, err := nsInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
 		AddFunc:    func(obj interface{}) { w.onNamespace(obj, false) },
 		UpdateFunc: func(_, n interface{}) { w.onNamespace(n, false) },
 		DeleteFunc: func(obj interface{}) { w.onNamespace(obj, true) },
 	}); err != nil {
 		return fmt.Errorf("add ns handler: %w", err)
 	}
 	if _, err := podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
 		AddFunc:    func(obj interface{}) { w.onPod(obj, false) },
 		UpdateFunc: func(_, n interface{}) { w.onPod(n, false) },
 		DeleteFunc: func(obj interface{}) { w.onPod(obj, true) },
 	}); err != nil {
 		return fmt.Errorf("add pod handler: %w", err)
 	}
 	w.logger.Info("netpol informers starting")
 	factory.Start(ctx.Done())
 	if !cache.WaitForCacheSync(ctx.Done(),
 		npInformer.HasSynced, nsInformer.HasSynced, podInformer.HasSynced) {
 		return fmt.Errorf("netpol informer caches failed to sync")
 	}
 	w.logger.Info("netpol informers synced",
 		"netpols", len(w.snapshotPolicies()),
 		"namespaces", len(w.snapshotNamespaces()),
 		"peer_pods", len(w.snapshotPeerPods()))
 	return nil
 }
 // unwrapDFSU lifts a DeletedFinalStateUnknown wrapper if present.
 func unwrapDFSU(obj interface{}) interface{} {
 	if d, ok := obj.(cache.DeletedFinalStateUnknown); ok {
 		return d.Obj
 	}
 	return obj
 }
 func (w *World) onPolicy(obj interface{}, deleted bool) {
 	p, ok := unwrapDFSU(obj).(*netv1.NetworkPolicy)
 	if !ok || p == nil {
 		return
 	}
 	key := p.Namespace + "/" + p.Name
 	w.mu.Lock()
 	if deleted {
 		delete(w.policies, key)
 	} else {
 		w.policies[key] = *p
 	}
 	w.mu.Unlock()
 	w.fireChange()
 }
 func (w *World) onNamespace(obj interface{}, deleted bool) {
 	ns, ok := unwrapDFSU(obj).(*corev1.Namespace)
 	if !ok || ns == nil {
 		return
 	}
 	w.mu.Lock()
 	if deleted {
 		delete(w.namespaces, ns.Name)
 	} else {
 		w.namespaces[ns.Name] = Namespace{Name: ns.Name, Labels: ns.Labels}
 	}
 	w.mu.Unlock()
 	w.fireChange()
 }
 func (w *World) onPod(obj interface{}, deleted bool) {
 	pod, ok := unwrapDFSU(obj).(*corev1.Pod)
 	if !ok || pod == nil {
 		return
 	}
 	key := pod.Namespace + "/" + pod.Name
 	w.mu.Lock()
 	if deleted {
 		delete(w.peerPods, key)
 	} else {
 		w.peerPods[key] = PeerPod{
 			Namespace: pod.Namespace,
 			Name:      pod.Name,
 			Labels:    pod.Labels,
 			IPs:       podIPs(pod),
 		}
 	}
 	w.mu.Unlock()
 	w.fireChange()
 }
 // podIPs extracts every PodIP from the status. Pods without status (still
 // scheduling) yield nil — safe for the translator.
 func podIPs(p *corev1.Pod) []net.IP {
 	out := make([]net.IP, 0, len(p.Status.PodIPs))
 	for _, addr := range p.Status.PodIPs {
 		ip := net.ParseIP(addr.IP)
 		if ip == nil {
 			continue
 		}
 		out = append(out, ip)
 	}
 	if len(out) == 0 && p.Status.PodIP != "" {
 		// Older clusters may populate PodIP but not PodIPs; tolerate both.
 		if ip := net.ParseIP(p.Status.PodIP); ip != nil {
 			out = append(out, ip)
 		}
 	}
 	return out
 }
 // snapshotPolicies returns a defensive copy of the policy map's values.
 func (w *World) snapshotPolicies() []netv1.NetworkPolicy {
 	w.mu.RLock()
 	defer w.mu.RUnlock()
 	out := make([]netv1.NetworkPolicy, 0, len(w.policies))
 	for _, p := range w.policies {
 		out = append(out, p)
 	}
 	return out
 }
 // snapshotNamespaces returns a defensive copy of the namespace map.
 func (w *World) snapshotNamespaces() []Namespace {
 	w.mu.RLock()
 	defer w.mu.RUnlock()
 	out := make([]Namespace, 0, len(w.namespaces))
 	for _, n := range w.namespaces {
 		out = append(out, n)
 	}
 	return out
 }
 // snapshotPeerPods returns a defensive copy of the peer-pod map.
 func (w *World) snapshotPeerPods() []PeerPod {
 	w.mu.RLock()
 	defer w.mu.RUnlock()
 	out := make([]PeerPod, 0, len(w.peerPods))
 	for _, p := range w.peerPods {
 		out = append(out, p)
 	}
 	return out
 }
@@ -0,0 +1,115 @@
 package netpol
 import (
 	"context"
 	"log/slog"
 	"sync"
 	"time"
 )
 // LocalPodSource produces the set of local pods (with their HostIface and
 // IPs) the reconciler should enforce policy for. The agent's allocation
 // store + pod informer is the natural implementer.
 //
 // The function is called inside the reconciler under no lock, so it must
 // be safe for concurrent invocation.
 type LocalPodSource func() []Pod
 // Reconciler turns the World cache + LocalPodSource into nft rule
 // applications. One reconcile pass:
 //
 //	pods + policies + namespaces  →  Translate  →  Render  →  Apply
 //
 // The pass runs on:
 //
 //   - World.OnChange (any informer event), debounced through a single
 //     coalescing channel,
 //   - a periodic tick (default 30s) so we self-heal if the kernel
 //     ruleset diverges from desired (e.g. someone manually `nft flush`d),
 //   - and explicit Trigger() calls (the agent fires this from CNI ADD /
 //     DEL hooks so policy lands before pod traffic flows).
 type Reconciler struct {
 	World    *World
 	Local    LocalPodSource
 	Applier  *Applier
 	Logger   *slog.Logger
 	Interval time.Duration
 	mu      sync.Mutex
 	trigger chan struct{}
 }
 // NewReconciler returns a Reconciler ready to Run. Interval defaults to
 // 30s if zero.
 func NewReconciler(world *World, local LocalPodSource, applier *Applier, logger *slog.Logger) *Reconciler {
 	r := &Reconciler{
 		World:    world,
 		Local:    local,
 		Applier:  applier,
 		Logger:   logger,
 		Interval: 30 * time.Second,
 		trigger:  make(chan struct{}, 1),
 	}
 	world.OnChange(r.Trigger)
 	return r
 }
 // Trigger requests one reconcile pass. Coalesces — if a pass is already
 // pending, the call is a no-op.
 func (r *Reconciler) Trigger() {
 	select {
 	case r.trigger <- struct{}{}:
 	default:
 	}
 }
 // Run blocks until ctx is cancelled. Reconciles on Trigger or every
 // Interval; calls Applier.Clear on shutdown.
 func (r *Reconciler) Run(ctx context.Context) {
 	t := time.NewTicker(r.Interval)
 	defer t.Stop()
 	r.reconcile(ctx) // initial pass
 	for {
 		select {
 		case <-ctx.Done():
 			// Best-effort: drop our table on graceful exit. If the agent
 			// crashed without doing this, the next agent's first apply
 			// will replace the stale table atomically anyway.
 			_ = r.Applier.Clear(context.Background())
 			return
 		case <-t.C:
 			r.reconcile(ctx)
 		case <-r.trigger:
 			r.reconcile(ctx)
 		}
 	}
 }
 func (r *Reconciler) reconcile(ctx context.Context) {
 	r.mu.Lock()
 	defer r.mu.Unlock()
 	in := Inputs{
 		LocalPods:  r.Local(),
 		PeerPods:   r.World.snapshotPeerPods(),
 		Namespaces: r.World.snapshotNamespaces(),
 		Policies:   r.World.snapshotPolicies(),
 	}
 	out, err := Translate(in, func(s string) { r.Logger.Warn(s) })
 	if err != nil {
 		r.Logger.Warn("netpol translate failed", "err", err)
 		return
 	}
 	script := Render(out)
 	if err := r.Applier.Apply(ctx, script); err != nil {
 		r.Logger.Warn("netpol apply failed", "err", err)
 		return
 	}
 	if len(out.Isolated) > 0 {
 		r.Logger.Info("netpol applied",
 			"isolated_chains", len(out.Isolated),
 			"rules", len(out.Rules),
 			"local_pods", len(in.LocalPods),
 			"policies", len(in.Policies))
 	}
 }
@@ -0,0 +1,160 @@
 package netpol
 import (
 	"context"
 	"io"
 	"log/slog"
 	"net"
 	"strings"
 	"sync"
 	"sync/atomic"
 	"testing"
 	corev1 "k8s.io/api/core/v1"
 	netv1 "k8s.io/api/networking/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )
 // fakeApplier captures Apply calls for assertion. Drop-in for *Applier in
 // tests because Reconciler depends only on the (Apply, Clear) pair.
 type fakeApplier struct {
 	mu    sync.Mutex
 	calls []string
 	last  string
 	err   error
 }
 func (f *fakeApplier) Apply(_ context.Context, script string) error {
 	f.mu.Lock()
 	defer f.mu.Unlock()
 	if f.err != nil {
 		return f.err
 	}
 	if script == f.last {
 		return nil // de-dup like the real Applier
 	}
 	f.last = script
 	f.calls = append(f.calls, script)
 	return nil
 }
 func (f *fakeApplier) Clear(_ context.Context) error { return nil }
 func (f *fakeApplier) lastScript() string {
 	f.mu.Lock()
 	defer f.mu.Unlock()
 	return f.last
 }
 func (f *fakeApplier) callCount() int {
 	f.mu.Lock()
 	defer f.mu.Unlock()
 	return len(f.calls)
 }
 // applierIface is satisfied by *Applier and *fakeApplier; we narrow
 // Reconciler to this in tests by adapting via a tiny wrapper.
 type applierIface interface {
 	Apply(context.Context, string) error
 	Clear(context.Context) error
 }
 // reconcileOnce drives one pass synchronously without spinning a goroutine.
 func reconcileOnce(t *testing.T, world *World, local LocalPodSource, app applierIface) {
 	t.Helper()
 	in := Inputs{
 		LocalPods:  local(),
 		PeerPods:   world.snapshotPeerPods(),
 		Namespaces: world.snapshotNamespaces(),
 		Policies:   world.snapshotPolicies(),
 	}
 	out, err := Translate(in, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if err := app.Apply(context.Background(), Render(out)); err != nil {
 		t.Fatal(err)
 	}
 }
 // silentLogger returns a slog.Logger discarding everything — keeps test
 // output tidy.
 func silentLogger() *slog.Logger {
 	return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{}))
 }
 func TestReconciler_NoIsolatedPods_ShortScript(t *testing.T) {
 	world := NewWorld(silentLogger())
 	local := func() []Pod { return nil }
 	app := &fakeApplier{}
 	reconcileOnce(t, world, local, app)
 	got := app.lastScript()
 	if !strings.Contains(got, "table inet flock_netpol") {
 		t.Fatalf("missing table:\n%s", got)
 	}
 	// Without any isolated pods the base chain has policy accept and no
 	// jumps. That's the desired "open" state.
 	if strings.Contains(got, "jump pod_") {
 		t.Fatalf("unexpected jump in open state:\n%s", got)
 	}
 }
 func TestReconciler_PolicyIsolatesLocalPod(t *testing.T) {
 	world := NewWorld(silentLogger())
 	// Seed a default-deny policy in ns1.
 	world.onPolicy(&netv1.NetworkPolicy{
 		ObjectMeta: metav1.ObjectMeta{Namespace: "ns1", Name: "deny-all"},
 		Spec: netv1.NetworkPolicySpec{
 			PodSelector: metav1.LabelSelector{},
 			PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
 		},
 	}, false)
 	local := func() []Pod {
 		return []Pod{{
 			Namespace: "ns1", Name: "web",
 			Labels:    map[string]string{"app": "web"},
 			HostIface: "flock00000001",
 			IPs:       []net.IP{mustIP("2001:db8::1")},
 		}}
 	}
 	app := &fakeApplier{}
 	reconcileOnce(t, world, local, app)
 	got := app.lastScript()
 	if !strings.Contains(got, "_ingress {") {
 		t.Fatalf("expected pod ingress chain:\n%s", got)
 	}
 	if !strings.Contains(got, "drop") {
 		t.Fatalf("expected default-deny drop:\n%s", got)
 	}
 	if !strings.Contains(got, `oifname "flock00000001" jump pod_`) {
 		t.Fatalf("expected base-chain jump anchored on veth:\n%s", got)
 	}
 }
 func TestReconciler_DedupesIdenticalRender(t *testing.T) {
 	world := NewWorld(silentLogger())
 	local := func() []Pod {
 		return []Pod{{
 			Namespace: "ns1", Name: "web", HostIface: "f1",
 			IPs: []net.IP{mustIP("2001:db8::1")},
 		}}
 	}
 	app := &fakeApplier{}
 	reconcileOnce(t, world, local, app)
 	reconcileOnce(t, world, local, app)
 	reconcileOnce(t, world, local, app)
 	if got := app.callCount(); got != 1 {
 		t.Fatalf("expected 1 unique apply, got %d", got)
 	}
 }
 func TestReconciler_OnChangeFiresTrigger(t *testing.T) {
 	world := NewWorld(silentLogger())
 	var triggered atomic.Int32
 	world.OnChange(func() { triggered.Add(1) })
 	world.onNamespace(&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: "foo"}}, false)
 	world.onPolicy(&netv1.NetworkPolicy{ObjectMeta: metav1.ObjectMeta{Namespace: "foo", Name: "p"}}, false)
 	if triggered.Load() != 2 {
 		t.Fatalf("expected 2 OnChange calls, got %d", triggered.Load())
 	}
 }
@@ -0,0 +1,315 @@
 package netpol
 import (
 	"fmt"
 	"hash/fnv"
 	"net"
 	"sort"
 	"strings"
 )
 // Render produces an nftables script that, when applied with `nft -f -`,
 // installs the desired NetworkPolicy enforcement state for this node.
 //
 // Layout:
 //
 //	table inet flock_netpol {
 //	  chain forward {                      # base chain on hook forward
 //	    type filter hook forward priority filter; policy accept;
 //	    # one jump per (pod, direction) that has rules and/or isolation
 //	    iifname "flock1a2b3c4d" ip6 saddr 2001:db8::1 jump pod_<hash>_egress
 //	    oifname "flock1a2b3c4d" ip6 daddr 2001:db8::1 jump pod_<hash>_ingress
 //	  }
 //	  chain pod_<hash>_ingress {           # one per isolated direction
 //	    # explicit allow lines (empty for default-deny)
 //	    drop
 //	  }
 //	  chain pod_<hash>_egress { ... }
 //	}
 //
 // The whole table is replaced atomically: a "delete table … 2>/dev/null"
 // (best-effort) followed by an "add table" + the chains. nft executes the
 // script as a single transaction; partial application is impossible.
 //
 // Output is deterministic: equal Output → byte-identical script. The
 // reconciler relies on this for de-dup.
 func Render(out Output) string {
 	var sb strings.Builder
 	sb.WriteString("# Generated by flock-agent netpol; do not edit by hand.\n")
 	// Best-effort delete; if the table doesn't exist (first run) nft
 	// returns an error, hence the redirect. The "add table" then
 	// recreates everything.
 	sb.WriteString("destroy table inet flock_netpol\n")
 	sb.WriteString("table inet flock_netpol {\n")
 	// Build per-(pod, direction) chains. We need them defined BEFORE the
 	// base chain references them, so we render chains first.
 	chains := buildChains(out)
 	for _, c := range chains {
 		writeChain(&sb, c)
 	}
 	// Base chain emits jumps in a stable order (chain name asc).
 	sb.WriteString("\tchain forward {\n")
 	sb.WriteString("\t\ttype filter hook forward priority filter; policy accept;\n")
 	for _, c := range chains {
 		writeBaseJump(&sb, c)
 	}
 	sb.WriteString("\t}\n")
 	sb.WriteString("}\n")
 	return sb.String()
 }
 // chain is one rendered chain — one direction of one pod.
 type chain struct {
 	name      string // pod_<hash>_ingress / _egress
 	hostIface string
 	podIPs    []net.IP
 	direction Direction
 	rules     []Rule
 	policy    string // "drop" or "accept"
 }
 // buildChains groups rules by (PodKey, Direction) and adds default-deny
 // chains for isolated directions that received no explicit rules.
 func buildChains(out Output) []chain {
 	type key struct {
 		podKey string
 		dir    Direction
 	}
 	byKey := map[key]*chain{}
 	// Seed isolated directions with empty chains so default-deny lands
 	// even when no explicit allow rule was emitted for them.
 	for iso := range out.Isolated {
 		byKey[key{podKey: iso.PodKey, dir: iso.Direction}] = &chain{
 			direction: iso.Direction,
 			policy:    "drop",
 		}
 	}
 	// Append rules into their chain. Rule.PodIPs and HostIface are
 	// authoritative — every rule for a given pod carries the same values
 	// (translator invariant), so we copy from the first.
 	for _, r := range out.Rules {
 		k := key{podKey: r.PodKey, dir: r.Direction}
 		c := byKey[k]
 		if c == nil {
 			// Rule for a non-isolated direction shouldn't happen in
 			// practice (translator only emits rules for selected pods)
 			// but be tolerant — the chain just gets policy accept.
 			c = &chain{direction: r.Direction, policy: "accept"}
 			byKey[k] = c
 		}
 		c.rules = append(c.rules, r)
 		if c.hostIface == "" {
 			c.hostIface = r.HostIface
 			c.podIPs = append([]net.IP(nil), r.PodIPs...)
 		}
 	}
 	// If a chain was created from Isolated only (no rules), look up the
 	// pod's HostIface + IPs from Output.Pods. This is the path a
 	// default-deny policy takes — no allow rules, only isolation.
 	for k, c := range byKey {
 		if c.hostIface != "" {
 			continue
 		}
 		if lp, ok := out.Pods[k.podKey]; ok {
 			c.hostIface = lp.HostIface
 			c.podIPs = append([]net.IP(nil), lp.IPs...)
 			continue
 		}
 		// Last resort: lift from any rule sharing the PodKey. Should
 		// not normally happen — the translator populates Pods for every
 		// isolated pod — but defends against partially-populated Output
 		// values constructed by tests.
 		for _, r := range out.Rules {
 			if r.PodKey == k.podKey {
 				c.hostIface = r.HostIface
 				c.podIPs = append([]net.IP(nil), r.PodIPs...)
 				break
 			}
 		}
 	}
 	// Materialise chain names and emit in deterministic order.
 	var chains []chain
 	for k, c := range byKey {
 		if c.hostIface == "" {
 			continue // can't jump to it; skip
 		}
 		c.name = chainName(k.podKey, c.direction)
 		chains = append(chains, *c)
 	}
 	sort.Slice(chains, func(i, j int) bool { return chains[i].name < chains[j].name })
 	return chains
 }
 // chainName produces a stable, name-safe chain identifier. Pod keys can
 // contain characters nft doesn't allow in identifiers, so we hash them.
 // Direction keeps ingress and egress separate.
 func chainName(podKey string, dir Direction) string {
 	h := fnv.New64a()
 	_, _ = h.Write([]byte(podKey))
 	return fmt.Sprintf("pod_%016x_%s", h.Sum64(), dir)
 }
 // writeChain emits the chain definition. Empty chains exist deliberately:
 // the chain's drop policy IS the default-deny.
 func writeChain(sb *strings.Builder, c chain) {
 	fmt.Fprintf(sb, "\tchain %s {\n", c.name)
 	// Stateful accept for return traffic. NetworkPolicy applies to the
 	// start of a new connection — reply packets for pod-initiated flows
 	// (egress) and follow-up packets of an established ingress flow must
 	// pass regardless of the explicit allow set, otherwise the chain's
 	// final drop kills ephemeral-port replies (e.g. pod → kube-apiserver).
 	sb.WriteString("\t\tct state established,related accept\n")
 	for _, r := range c.rules {
 		writeAllowRule(sb, r)
 	}
 	if c.policy == "drop" {
 		sb.WriteString("\t\tdrop\n")
 	}
 	sb.WriteString("\t}\n")
 }
 // writeAllowRule emits one accept line:
 //
 //	[ip|ip6 saddr {peers}] [ip|ip6 saddr != {except}] [proto dport {port|port-end}] accept
 //
 // The saddr / daddr field flips based on direction (ingress = from peer →
 // match saddr; egress = to peer → match daddr).
 func writeAllowRule(sb *strings.Builder, r Rule) {
 	v6Peers, v4Peers := splitFamily(r.PeerCIDRs)
 	v6Except, v4Except := splitFamily(r.PeerExcept)
 	v6Pod, v4Pod := splitIPFamily(r.PodIPs)
 	hasPeerFilter := len(r.PeerCIDRs) > 0
 	emit := func(family string, peers, except []*net.IPNet, podIP net.IP) {
 		if hasPeerFilter && len(peers) == 0 && len(except) == 0 {
 			// Peer filter exists but no entries of this family — rule
 			// must not match anything for this family.
 			return
 		}
 		if podIP == nil {
 			// Pod has no address of this family; nothing to guard.
 			return
 		}
 		for _, port := range r.Ports {
 			sb.WriteString("\t\t")
 			// Peer (saddr/daddr) match: address is "peer's address",
 			// which is saddr on ingress and daddr on egress.
 			peerField := peerAddrField(family, r.Direction)
 			if hasPeerFilter && len(peers) > 0 {
 				fmt.Fprintf(sb, "%s { %s } ", peerField, joinCIDRs(peers))
 			}
 			if hasPeerFilter && len(except) > 0 {
 				fmt.Fprintf(sb, "%s != { %s } ", peerField, joinCIDRs(except))
 			}
 			// Port match.
 			writePortMatch(sb, port)
 			fmt.Fprintf(sb, "%s\n", r.Action)
 		}
 	}
 	emit("ip6", v6Peers, v6Except, v6Pod)
 	emit("ip", v4Peers, v4Except, v4Pod)
 }
 // peerAddrField returns "ip6 saddr" / "ip saddr" / "ip6 daddr" / "ip daddr"
 // depending on family + direction. Ingress matches the peer as the source;
 // egress matches the peer as the destination.
 func peerAddrField(family string, dir Direction) string {
 	switch {
 	case dir == DirIngress:
 		return family + " saddr"
 	default:
 		return family + " daddr"
 	}
 }
 // writePortMatch appends "tcp dport 80 " (single port) or
 // "tcp dport 8000-8999 " (range), or nothing when port is "any".
 func writePortMatch(sb *strings.Builder, p PortMatch) {
 	if p.Port == 0 && p.Protocol == "" {
 		return
 	}
 	proto := p.Protocol
 	if proto == "" {
 		proto = "tcp"
 	}
 	if p.Port == 0 {
 		// Protocol-only match. nft has `meta l4proto tcp`.
 		fmt.Fprintf(sb, "meta l4proto %s ", proto)
 		return
 	}
 	if p.EndPort > p.Port {
 		fmt.Fprintf(sb, "%s dport %d-%d ", proto, p.Port, p.EndPort)
 		return
 	}
 	fmt.Fprintf(sb, "%s dport %d ", proto, p.Port)
 }
 // writeBaseJump emits one line per (pod, direction) chain in the base
 // `forward` chain. The match is anchored on the host-side veth name —
 // the veth uniquely belongs to one pod, so anything traversing it is
 // to/from that pod by definition.
 //
 // We deliberately don't filter on the pod's eth0 address: the pod can
 // also receive traffic addressed to its anycast IP (or any other host
 // route the operator has installed via flock-agent), and policy must
 // apply uniformly to all of it.
 func writeBaseJump(sb *strings.Builder, c chain) {
 	var iface string
 	if c.direction == DirEgress {
 		iface = "iifname"
 	} else {
 		iface = "oifname"
 	}
 	fmt.Fprintf(sb, "\t\t%s \"%s\" jump %s\n", iface, c.hostIface, c.name)
 }
 // splitFamily partitions CIDRs into (v6, v4) lists, preserving order
 // within each family.
 func splitFamily(cs []*net.IPNet) ([]*net.IPNet, []*net.IPNet) {
 	var v6, v4 []*net.IPNet
 	for _, c := range cs {
 		if c.IP.To4() != nil {
 			v4 = append(v4, c)
 		} else {
 			v6 = append(v6, c)
 		}
 	}
 	return v6, v4
 }
 // splitIPFamily picks one v6 and one v4 from a list of pod IPs (a pod has
 // at most one of each in flock's model).
 func splitIPFamily(ips []net.IP) (v6, v4 net.IP) {
 	for _, ip := range ips {
 		if ip == nil {
 			continue
 		}
 		if ip.To4() != nil {
 			if v4 == nil {
 				v4 = ip
 			}
 		} else {
 			if v6 == nil {
 				v6 = ip
 			}
 		}
 	}
 	return
 }
 func joinCIDRs(cs []*net.IPNet) string {
 	parts := make([]string, len(cs))
 	for i, c := range cs {
 		parts[i] = c.String()
 	}
 	sort.Strings(parts)
 	return strings.Join(parts, ", ")
 }
@@ -0,0 +1,228 @@
 package netpol
 import (
 	"net"
 	"strings"
 	"testing"
 )
 // TestRender_DefaultDeny — an isolated direction with no rules renders
 // to a chain whose last action is "drop".
 func TestRender_DefaultDeny(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 		},
 		Rules: []Rule{
 			// Need at least one rule to give the chain its HostIface +
 			// PodIPs. Use an empty rule that selects the same chain.
 			{PodKey: "ns/web", HostIface: "flock00000001", PodIPs: []net.IP{mustIP("2001:db8::1")},
 				Direction: DirIngress, Action: ActionAccept,
 				Ports: []PortMatch{{}}},
 		},
 	}
 	got := Render(out)
 	if !strings.Contains(got, "table inet flock_netpol") {
 		t.Fatalf("missing table:\n%s", got)
 	}
 	if !strings.Contains(got, "type filter hook forward") {
 		t.Fatalf("missing base chain:\n%s", got)
 	}
 	if !strings.Contains(got, "drop") {
 		t.Fatalf("expected default-deny drop in chain:\n%s", got)
 	}
 	// Pod chain name must be deterministic-looking (pod_<hex>_ingress).
 	if !strings.Contains(got, "_ingress {") {
 		t.Fatalf("missing pod ingress chain:\n%s", got)
 	}
 	// Base chain jump anchored solely on veth — anycast must not bypass.
 	if !strings.Contains(got, `oifname "flock00000001" jump pod_`) {
 		t.Fatalf("missing veth-only ingress jump in base chain:\n%s", got)
 	}
 	// Stateful accept must be present so reply traffic for pod-initiated
 	// outbound (e.g. ephemeral-port replies from kube-apiserver) is not
 	// dropped by the chain's final drop. Regression guard: production hit
 	// this when garage's k8s-discovery → apiserver replies got dropped.
 	if !strings.Contains(got, "ct state established,related accept") {
 		t.Fatalf("missing ct state established,related accept:\n%s", got)
 	}
 }
 // TestRender_DualStack — dual-stack pod gets one veth-anchored jump per
 // direction (no per-family jump; the chain handles both).
 func TestRender_DualStack(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 		},
 		Rules: []Rule{{
 			PodKey: "ns/web", HostIface: "f1",
 			PodIPs:    []net.IP{mustIP("2001:db8::1"), mustIP("10.0.0.1")},
 			Direction: DirIngress, Action: ActionAccept,
 			Ports: []PortMatch{{Protocol: "tcp", Port: 80}},
 		}},
 	}
 	got := Render(out)
 	// Exactly one ingress jump line with no per-family daddr.
 	if got != "" && strings.Count(got, `oifname "f1" jump`) != 1 {
 		t.Fatalf("expected exactly one veth-only ingress jump:\n%s", got)
 	}
 	// The accept rule itself should still split per family inside the
 	// pod chain.
 	if !strings.Contains(got, "ip6 saddr") || !strings.Contains(got, "ip saddr") {
 		// no peer filter set → should NOT have ip6/ip saddr filters
 		// inside the chain. (Skip this assertion: TestRender_AllowAllPeers
 		// covers the no-peer-filter case.)
 	}
 }
 // TestRender_PortAndPeer — a Rule with peer + port emits a syntactically
 // well-formed allow line.
 func TestRender_PortAndPeer(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 		},
 		Rules: []Rule{{
 			PodKey: "ns/web", HostIface: "f1",
 			PodIPs:    []net.IP{mustIP("2001:db8::1")},
 			Direction: DirIngress, Action: ActionAccept,
 			PeerCIDRs: []*net.IPNet{mustNet("2001:db8::a/128")},
 			Ports:     []PortMatch{{Protocol: "tcp", Port: 80}},
 		}},
 	}
 	got := Render(out)
 	if !strings.Contains(got, "ip6 saddr { 2001:db8::a/128 } tcp dport 80 accept") {
 		t.Fatalf("expected ingress allow with v6 peer + tcp/80:\n%s", got)
 	}
 }
 // TestRender_PortRange — endPort renders as "8000-8999".
 func TestRender_PortRange(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 		},
 		Rules: []Rule{{
 			PodKey: "ns/web", HostIface: "f1",
 			PodIPs:    []net.IP{mustIP("2001:db8::1")},
 			Direction: DirIngress, Action: ActionAccept,
 			PeerCIDRs: []*net.IPNet{mustNet("0.0.0.0/0"), mustNet("::/0")},
 			Ports:     []PortMatch{{Protocol: "tcp", Port: 8000, EndPort: 8999}},
 		}},
 	}
 	got := Render(out)
 	if !strings.Contains(got, "tcp dport 8000-8999") {
 		t.Fatalf("expected port range:\n%s", got)
 	}
 }
 // TestRender_IPBlockExcept — except produces a "saddr != { … }" guard.
 func TestRender_IPBlockExcept(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 		},
 		Rules: []Rule{{
 			PodKey: "ns/web", HostIface: "f1",
 			PodIPs:     []net.IP{mustIP("10.0.0.1")},
 			Direction:  DirIngress, Action: ActionAccept,
 			PeerCIDRs:  []*net.IPNet{mustNet("10.0.0.0/8")},
 			PeerExcept: []*net.IPNet{mustNet("10.99.0.0/16")},
 			Ports:      []PortMatch{{}},
 		}},
 	}
 	got := Render(out)
 	if !strings.Contains(got, "ip saddr { 10.0.0.0/8 }") {
 		t.Fatalf("expected ipBlock cidr:\n%s", got)
 	}
 	if !strings.Contains(got, "ip saddr != { 10.99.0.0/16 }") {
 		t.Fatalf("expected ipBlock except:\n%s", got)
 	}
 }
 // TestRender_AllowAllPeers — empty PeerCIDRs/PeerExcept means "any peer";
 // the rule should emit an unconditional accept (modulo port).
 func TestRender_AllowAllPeers(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 		},
 		Rules: []Rule{{
 			PodKey: "ns/web", HostIface: "f1",
 			PodIPs:    []net.IP{mustIP("2001:db8::1")},
 			Direction: DirIngress, Action: ActionAccept,
 			Ports: []PortMatch{{Protocol: "tcp", Port: 443}},
 		}},
 	}
 	got := Render(out)
 	if !strings.Contains(got, "tcp dport 443 accept") {
 		t.Fatalf("expected unconditional tcp/443 allow:\n%s", got)
 	}
 	// Should NOT have a saddr/daddr filter (empty peers).
 	if strings.Contains(got, "ip6 saddr {") || strings.Contains(got, "ip saddr {") {
 		t.Fatalf("expected no peer filter:\n%s", got)
 	}
 }
 // TestRender_Determinism — same input → byte-identical output.
 func TestRender_Determinism(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirIngress}: {},
 			{PodKey: "ns/db", Direction: DirEgress}:   {},
 		},
 		Rules: []Rule{
 			{PodKey: "ns/web", HostIface: "f1", PodIPs: []net.IP{mustIP("2001:db8::1")},
 				Direction: DirIngress, Action: ActionAccept,
 				PeerCIDRs: []*net.IPNet{mustNet("2001:db8::5/128"), mustNet("2001:db8::3/128")},
 				Ports:     []PortMatch{{Protocol: "tcp", Port: 80}}},
 			{PodKey: "ns/db", HostIface: "f2", PodIPs: []net.IP{mustIP("2001:db8::2")},
 				Direction: DirEgress, Action: ActionAccept,
 				PeerCIDRs: []*net.IPNet{mustNet("2001:db8::aa/128")},
 				Ports:     []PortMatch{{}}},
 		},
 	}
 	a := Render(out)
 	b := Render(out)
 	if a != b {
 		t.Fatalf("Render not deterministic:\nA=\n%s\nB=\n%s", a, b)
 	}
 	// And peers in the rule must be sorted (we deliberately gave 5 then 3).
 	if strings.Index(a, "2001:db8::3/128") > strings.Index(a, "2001:db8::5/128") {
 		t.Fatalf("peer CIDRs not sorted within rule:\n%s", a)
 	}
 }
 // TestRender_EgressDirection — egress rules use iifname + saddr (pod-side).
 func TestRender_EgressDirection(t *testing.T) {
 	out := Output{
 		Isolated: map[Isolation]struct{}{
 			{PodKey: "ns/web", Direction: DirEgress}: {},
 		},
 		Rules: []Rule{{
 			PodKey: "ns/web", HostIface: "f1",
 			PodIPs:    []net.IP{mustIP("2001:db8::1")},
 			Direction: DirEgress, Action: ActionAccept,
 			PeerCIDRs: []*net.IPNet{mustNet("2001:db8::aa/128")},
 			Ports:     []PortMatch{{Protocol: "tcp", Port: 53}},
 		}},
 	}
 	got := Render(out)
 	// Base-chain jump for egress matches iifname only.
 	if !strings.Contains(got, `iifname "f1" jump pod_`) {
 		t.Fatalf("missing egress base-chain jump:\n%s", got)
 	}
 	// Peer filter for egress matches the *destination* (the peer is downstream).
 	if !strings.Contains(got, "ip6 daddr { 2001:db8::aa/128 }") {
 		t.Fatalf("expected daddr peer filter for egress:\n%s", got)
 	}
 }
 func mustNet(s string) *net.IPNet {
 	_, n, err := net.ParseCIDR(s)
 	if err != nil {
 		panic(err)
 	}
 	return n
 }
@@ -0,0 +1,443 @@
 package netpol
 import (
 	"fmt"
 	"net"
 	"sort"
 	netv1 "k8s.io/api/networking/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/labels"
 )
 // Inputs is the world-view the translator consumes. All fields are owned
 // by the caller; the translator does not mutate them.
 type Inputs struct {
 	// LocalPods are the pods scheduled on this node that have a committed
 	// flock allocation. Only these pods get rules — peers may live
 	// elsewhere.
 	LocalPods []Pod
 	// PeerPods is the cluster-wide pod set used to resolve podSelector +
 	// namespaceSelector peers. It is fine to include the local pods here
 	// too; duplicates are deduped by (namespace, name).
 	PeerPods []PeerPod
 	// Namespaces is the cluster's full Namespace set. Used for
 	// namespaceSelector matching.
 	Namespaces []Namespace
 	// Policies is every NetworkPolicy in the cluster. The translator
 	// filters down to those that select at least one local pod.
 	Policies []netv1.NetworkPolicy
 }
 // Output is the result of one translation pass.
 type Output struct {
 	// Rules is the flat ordered list of allow rules to render. The
 	// renderer groups them by (PodKey, Direction) into chains.
 	Rules []Rule
 	// Isolated is the set of (PodKey, Direction) pairs whose chain must
 	// have a default-deny policy. A pod selected by at least one policy
 	// in a given direction shows up here. The renderer uses this to
 	// decide whether to emit a chain at all and what its base policy is.
 	Isolated map[Isolation]struct{}
 	// Pods carries the HostIface + IPs for every local pod referenced
 	// by the policy world, including pods that produced only isolation
 	// (default-deny) without any allow rules. The renderer needs this
 	// because such a pod has no Rule to lift the HostIface from.
 	Pods map[string]LocalPod // key = namespace/name
 }
 // Isolation is the (PodKey, Direction) key of the Isolated map.
 type Isolation struct {
 	PodKey    string
 	Direction Direction
 }
 // Translate runs the translation pass. It is a pure function: same Inputs
 // always produces semantically equal Output. (Order of slices is stable
 // but Rules within a chain follow the order in which selecting policies
 // appear, which is itself sorted; see canonicalisePolicies.)
 //
 // Errors are returned only for unrecoverable malformed input; per-rule
 // translation errors are logged via warn and skipped so that a single
 // broken policy can't take down enforcement for a whole node. The optional
 // warn callback is invoked for each skipped sub-rule with a human-readable
 // message. Pass nil to silently drop.
 func Translate(in Inputs, warn func(string)) (Output, error) {
 	if warn == nil {
 		warn = func(string) {}
 	}
 	out := Output{
 		Isolated: map[Isolation]struct{}{},
 		Pods:     map[string]LocalPod{},
 	}
 	policies := canonicalisePolicies(in.Policies)
 	nsByName := indexNamespaces(in.Namespaces)
 	peerPodsByNS := indexPeerPods(in.PeerPods)
 	for _, pod := range in.LocalPods {
 		if len(pod.IPs) == 0 {
 			continue // no allocation yet; translator skips
 		}
 		key := pod.Namespace + "/" + pod.Name
 		// Find every policy in pod.Namespace whose podSelector matches.
 		// Cross-namespace policies do not select pods outside their own
 		// namespace; that's how the NetworkPolicy spec defines it.
 		for _, p := range policies {
 			if p.Namespace != pod.Namespace {
 				continue
 			}
 			sel, err := metav1.LabelSelectorAsSelector(&p.Spec.PodSelector)
 			if err != nil {
 				warn(fmt.Sprintf("policy %s/%s: invalid podSelector: %v", p.Namespace, p.Name, err))
 				continue
 			}
 			if !sel.Matches(labels.Set(pod.Labels)) {
 				continue
 			}
 			ingress, egress := policyDirections(&p)
 			if ingress || egress {
 				out.Pods[key] = LocalPod{
 					PodKey:    key,
 					HostIface: pod.HostIface,
 					IPs:       append([]net.IP(nil), pod.IPs...),
 				}
 			}
 			if ingress {
 				out.Isolated[Isolation{PodKey: key, Direction: DirIngress}] = struct{}{}
 			}
 			if egress {
 				out.Isolated[Isolation{PodKey: key, Direction: DirEgress}] = struct{}{}
 			}
 			// Translate ingress rules.
 			if ingress {
 				for ri, r := range p.Spec.Ingress {
 					rules, err := buildIngressRules(pod, r, p.Namespace, nsByName, peerPodsByNS)
 					if err != nil {
 						warn(fmt.Sprintf("policy %s/%s ingress[%d]: %v", p.Namespace, p.Name, ri, err))
 						continue
 					}
 					out.Rules = append(out.Rules, rules...)
 				}
 			}
 			// Translate egress rules.
 			if egress {
 				for ri, r := range p.Spec.Egress {
 					rules, err := buildEgressRules(pod, r, p.Namespace, nsByName, peerPodsByNS)
 					if err != nil {
 						warn(fmt.Sprintf("policy %s/%s egress[%d]: %v", p.Namespace, p.Name, ri, err))
 						continue
 					}
 					out.Rules = append(out.Rules, rules...)
 				}
 			}
 		}
 	}
 	return out, nil
 }
 // policyDirections reports which directions a NetworkPolicy isolates.
 //
 // Per the spec, the PolicyTypes field is the source of truth when set;
 // when omitted, isolation is inferred from which rule lists are populated
 // (Ingress always; Egress only if Spec.Egress is non-empty).
 func policyDirections(p *netv1.NetworkPolicy) (ingress, egress bool) {
 	if len(p.Spec.PolicyTypes) > 0 {
 		for _, t := range p.Spec.PolicyTypes {
 			switch t {
 			case netv1.PolicyTypeIngress:
 				ingress = true
 			case netv1.PolicyTypeEgress:
 				egress = true
 			}
 		}
 		return
 	}
 	ingress = true
 	egress = len(p.Spec.Egress) > 0
 	return
 }
 // buildIngressRules expands one NetworkPolicyIngressRule into Rule(s).
 // One Rule per allowed peer-set; each Rule carries the full Ports filter
 // from the source rule.
 func buildIngressRules(
 	pod Pod,
 	r netv1.NetworkPolicyIngressRule,
 	policyNS string,
 	nsByName map[string]Namespace,
 	peerPodsByNS map[string][]PeerPod,
 ) ([]Rule, error) {
 	ports, err := translatePorts(r.Ports)
 	if err != nil {
 		return nil, err
 	}
 	peers, err := translatePeers(r.From, policyNS, nsByName, peerPodsByNS)
 	if err != nil {
 		return nil, err
 	}
 	return assembleRules(pod, DirIngress, peers, ports), nil
 }
 // buildEgressRules is the egress mirror of buildIngressRules.
 func buildEgressRules(
 	pod Pod,
 	r netv1.NetworkPolicyEgressRule,
 	policyNS string,
 	nsByName map[string]Namespace,
 	peerPodsByNS map[string][]PeerPod,
 ) ([]Rule, error) {
 	ports, err := translatePorts(r.Ports)
 	if err != nil {
 		return nil, err
 	}
 	peers, err := translatePeers(r.To, policyNS, nsByName, peerPodsByNS)
 	if err != nil {
 		return nil, err
 	}
 	return assembleRules(pod, DirEgress, peers, ports), nil
 }
 // peerSet is the resolved peer information for one rule's From / To list.
 type peerSet struct {
 	// allowAll is true when the rule has no peers at all (an empty From /
 	// To list, which the spec defines as "from anywhere"). It overrides
 	// CIDRs and Except.
 	allowAll bool
 	// CIDRs is the union of every IP / CIDR contributed by the rule's
 	// peer entries (resolved Pod IPs, namespace pods, and ipBlock.cidr).
 	CIDRs []*net.IPNet
 	// Except is the union of every ipBlock.except entry across the rule.
 	Except []*net.IPNet
 }
 // translatePeers resolves a list of NetworkPolicyPeer entries into a
 // peerSet. Each peer entry contributes either CIDRs (resolved from
 // pod / namespace selectors, or copied from ipBlock) or Except entries.
 func translatePeers(
 	peers []netv1.NetworkPolicyPeer,
 	policyNS string,
 	nsByName map[string]Namespace,
 	peerPodsByNS map[string][]PeerPod,
 ) (peerSet, error) {
 	if len(peers) == 0 {
 		return peerSet{allowAll: true}, nil
 	}
 	out := peerSet{}
 	for i, p := range peers {
 		switch {
 		case p.IPBlock != nil:
 			_, cidr, err := net.ParseCIDR(p.IPBlock.CIDR)
 			if err != nil {
 				return peerSet{}, fmt.Errorf("peer[%d] ipBlock.cidr %q: %w", i, p.IPBlock.CIDR, err)
 			}
 			out.CIDRs = append(out.CIDRs, cidr)
 			for j, ex := range p.IPBlock.Except {
 				_, exNet, err := net.ParseCIDR(ex)
 				if err != nil {
 					return peerSet{}, fmt.Errorf("peer[%d] ipBlock.except[%d] %q: %w", i, j, ex, err)
 				}
 				out.Except = append(out.Except, exNet)
 			}
 		case p.PodSelector != nil || p.NamespaceSelector != nil:
 			ips, err := resolvePodNamespacePeer(p, policyNS, nsByName, peerPodsByNS)
 			if err != nil {
 				return peerSet{}, fmt.Errorf("peer[%d]: %w", i, err)
 			}
 			out.CIDRs = append(out.CIDRs, ips...)
 		default:
 			return peerSet{}, fmt.Errorf("peer[%d] is empty (must set ipBlock, podSelector, or namespaceSelector)", i)
 		}
 	}
 	return out, nil
 }
 // resolvePodNamespacePeer walks the cluster's peer-pod set and returns
 // /128 (v6) and /32 (v4) CIDRs for each pod that matches the (possibly
 // combined) pod + namespace selectors.
 //
 // Selector semantics from the NetworkPolicy spec:
 //
 //   - podSelector + namespaceSelector both nil → handled upstream.
 //   - podSelector set, namespaceSelector nil → match in the policy's
 //     own namespace.
 //   - podSelector nil, namespaceSelector set → match every pod in
 //     namespaces that match the namespaceSelector.
 //   - both set → AND: pod must be in a matching namespace AND match
 //     the podSelector.
 //
 // An empty (non-nil) selector matches everything in scope.
 func resolvePodNamespacePeer(
 	p netv1.NetworkPolicyPeer,
 	policyNS string,
 	nsByName map[string]Namespace,
 	peerPodsByNS map[string][]PeerPod,
 ) ([]*net.IPNet, error) {
 	var podSel, nsSel labels.Selector
 	if p.PodSelector != nil {
 		s, err := metav1.LabelSelectorAsSelector(p.PodSelector)
 		if err != nil {
 			return nil, fmt.Errorf("podSelector: %w", err)
 		}
 		podSel = s
 	}
 	if p.NamespaceSelector != nil {
 		s, err := metav1.LabelSelectorAsSelector(p.NamespaceSelector)
 		if err != nil {
 			return nil, fmt.Errorf("namespaceSelector: %w", err)
 		}
 		nsSel = s
 	}
 	// Decide which namespaces are in scope.
 	var inScope []string
 	if nsSel == nil {
 		// Pod-only selector → just the policy's own namespace.
 		inScope = []string{policyNS}
 	} else {
 		for name, ns := range nsByName {
 			if nsSel.Matches(labels.Set(ns.Labels)) {
 				inScope = append(inScope, name)
 			}
 		}
 	}
 	var out []*net.IPNet
 	for _, ns := range inScope {
 		for _, pp := range peerPodsByNS[ns] {
 			if podSel != nil && !podSel.Matches(labels.Set(pp.Labels)) {
 				continue
 			}
 			for _, ip := range pp.IPs {
 				out = append(out, ipToHostCIDR(ip))
 			}
 		}
 	}
 	return out, nil
 }
 // translatePorts converts NetworkPolicyPort entries into PortMatch.
 //
 // A nil/empty Ports list on a NetworkPolicy rule means "all ports" by
 // spec; we represent that as a single zero-valued PortMatch (any proto,
 // any port) so the renderer can emit a single rule rather than a chain
 // of port-equality matches.
 func translatePorts(ports []netv1.NetworkPolicyPort) ([]PortMatch, error) {
 	if len(ports) == 0 {
 		return []PortMatch{{}}, nil
 	}
 	var out []PortMatch
 	for i, p := range ports {
 		var protoStr string
 		if p.Protocol != nil {
 			switch *p.Protocol {
 			case "TCP":
 				protoStr = "tcp"
 			case "UDP":
 				protoStr = "udp"
 			case "SCTP":
 				protoStr = "sctp"
 			default:
 				return nil, fmt.Errorf("port[%d]: protocol %q not supported", i, *p.Protocol)
 			}
 		} else {
 			// Spec default: TCP. We use empty string to mean "any of
 			// the three" only when the user explicitly sets neither
 			// protocol nor port; here the user has supplied a Port,
 			// which implies a protocol — and the spec default is TCP.
 			protoStr = "tcp"
 		}
 		var port, endPort int
 		if p.Port != nil {
 			if p.Port.Type != 0 { // intstr.Int = 0; intstr.String = 1
 				return nil, fmt.Errorf("port[%d]: named ports are not yet supported", i)
 			}
 			port = int(p.Port.IntVal)
 		}
 		if p.EndPort != nil {
 			endPort = int(*p.EndPort)
 			if endPort < port {
 				return nil, fmt.Errorf("port[%d]: endPort %d < port %d", i, endPort, port)
 			}
 		}
 		out = append(out, PortMatch{Protocol: protoStr, Port: port, EndPort: endPort})
 	}
 	return out, nil
 }
 // assembleRules emits the cross-product of (one peer-set) × (port list).
 // We currently emit a single Rule per direction since the peer-set is the
 // expensive shared field; ports go inline. allowAll peers result in a
 // rule with no PeerCIDRs, which the renderer treats as "any source".
 func assembleRules(pod Pod, dir Direction, peers peerSet, ports []PortMatch) []Rule {
 	if !peers.allowAll && len(peers.CIDRs) == 0 {
 		// Selector matched no peers (e.g. podSelector for a label that
 		// no live pod has). Emit nothing — the rule cannot allow any
 		// real traffic. The pod stays in default-deny for this rule.
 		return nil
 	}
 	r := Rule{
 		PodKey:    pod.Namespace + "/" + pod.Name,
 		HostIface: pod.HostIface,
 		PodIPs:    append([]net.IP(nil), pod.IPs...),
 		Direction: dir,
 		Action:    ActionAccept,
 		Ports:     append([]PortMatch(nil), ports...),
 	}
 	if !peers.allowAll {
 		r.PeerCIDRs = append([]*net.IPNet(nil), peers.CIDRs...)
 		r.PeerExcept = append([]*net.IPNet(nil), peers.Except...)
 	}
 	return []Rule{r}
 }
 // canonicalisePolicies sorts the policy slice by (namespace, name) so the
 // translator's output is deterministic regardless of informer event order.
 func canonicalisePolicies(p []netv1.NetworkPolicy) []netv1.NetworkPolicy {
 	out := append([]netv1.NetworkPolicy(nil), p...)
 	sort.Slice(out, func(i, j int) bool {
 		if out[i].Namespace != out[j].Namespace {
 			return out[i].Namespace < out[j].Namespace
 		}
 		return out[i].Name < out[j].Name
 	})
 	return out
 }
 func indexNamespaces(nss []Namespace) map[string]Namespace {
 	out := make(map[string]Namespace, len(nss))
 	for _, ns := range nss {
 		out[ns.Name] = ns
 	}
 	return out
 }
 func indexPeerPods(pods []PeerPod) map[string][]PeerPod {
 	out := map[string][]PeerPod{}
 	for _, p := range pods {
 		out[p.Namespace] = append(out[p.Namespace], p)
 	}
 	// Sort each namespace's pod list by (name) so the translator's IP
 	// ordering is stable.
 	for k := range out {
 		sort.Slice(out[k], func(i, j int) bool { return out[k][i].Name < out[k][j].Name })
 	}
 	return out
 }
 // ipToHostCIDR returns ip/32 (v4) or ip/128 (v6) — the smallest CIDR
 // covering exactly that one address.
 func ipToHostCIDR(ip net.IP) *net.IPNet {
 	if v4 := ip.To4(); v4 != nil {
 		return &net.IPNet{IP: v4, Mask: net.CIDRMask(32, 32)}
 	}
 	return &net.IPNet{IP: ip.To16(), Mask: net.CIDRMask(128, 128)}
 }
@@ -0,0 +1,147 @@
 package netpol
 import (
 	"net"
 	"strings"
 	"testing"
 	corev1 "k8s.io/api/core/v1"
 	netv1 "k8s.io/api/networking/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/util/intstr"
 )
 // FuzzTranslate_AndRender stitches the Translator and Renderer together
 // against synthetic NetworkPolicies built from fuzzed bytes. We are not
 // trying to produce *valid* policies — the goal is to confirm that:
 //
 //  1. Neither stage panics on weird input.
 //  2. Render output is balanced (every "{" has a matching "}").
 //  3. Rendering twice is byte-stable.
 //  4. The Pods set in Output is consistent with Isolated (every isolated
 //     PodKey has a matching entry in Pods).
 //
 // The translator's warn callback is captured to ensure it never panics
 // with unexpected message types either.
 func FuzzTranslate_AndRender(f *testing.F) {
 	type seed struct {
 		policyNS, policyName        string
 		podSelectorKey, podSelValue string
 		peerSelectorKey, peerSelV   string
 		peerNS, peerName, peerIP    string
 		port                        uint16
 		ipBlockCIDR, ipBlockExcept  string
 	}
 	for _, s := range []seed{
 		{policyNS: "ns1", policyName: "p1", podSelectorKey: "app", podSelValue: "web", port: 80},
 		{policyNS: "ns1", policyName: "p1", peerSelectorKey: "app", peerSelV: "client", peerNS: "ns1", peerName: "c1", peerIP: "2001:db8::aa", port: 443},
 		{policyNS: "ns1", policyName: "p1", ipBlockCIDR: "10.0.0.0/8", ipBlockExcept: "10.99.0.0/16", port: 0},
 		{policyNS: "", policyName: ""}, // pathological
 		{policyNS: "ns1", policyName: "p1", podSelectorKey: "app\x00", podSelValue: "web\nnewline"},
 		{policyNS: "ns1", policyName: "p1", port: 65535},
 		{policyNS: "ns1", policyName: "p1", port: 1},
 	} {
 		f.Add(s.policyNS, s.policyName, s.podSelectorKey, s.podSelValue,
 			s.peerSelectorKey, s.peerSelV, s.peerNS, s.peerName, s.peerIP,
 			s.port, s.ipBlockCIDR, s.ipBlockExcept)
 	}
 	f.Fuzz(func(t *testing.T,
 		policyNS, policyName,
 		podSelectorKey, podSelValue,
 		peerSelectorKey, peerSelV,
 		peerNS, peerName, peerIP string,
 		port uint16,
 		ipBlockCIDR, ipBlockExcept string,
 	) {
 		// Build a synthetic policy.
 		policy := netv1.NetworkPolicy{
 			ObjectMeta: metav1.ObjectMeta{Namespace: policyNS, Name: policyName},
 			Spec: netv1.NetworkPolicySpec{
 				PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
 			},
 		}
 		if podSelectorKey != "" {
 			policy.Spec.PodSelector = metav1.LabelSelector{
 				MatchLabels: map[string]string{podSelectorKey: podSelValue},
 			}
 		} else {
 			policy.Spec.PodSelector = metav1.LabelSelector{}
 		}
 		ingress := netv1.NetworkPolicyIngressRule{}
 		if peerSelectorKey != "" {
 			ingress.From = append(ingress.From, netv1.NetworkPolicyPeer{
 				PodSelector: &metav1.LabelSelector{
 					MatchLabels: map[string]string{peerSelectorKey: peerSelV},
 				},
 			})
 		}
 		if ipBlockCIDR != "" {
 			peer := netv1.NetworkPolicyPeer{
 				IPBlock: &netv1.IPBlock{CIDR: ipBlockCIDR},
 			}
 			if ipBlockExcept != "" {
 				peer.IPBlock.Except = []string{ipBlockExcept}
 			}
 			ingress.From = append(ingress.From, peer)
 		}
 		if port != 0 {
 			tcp := corev1.ProtocolTCP
 			p := intstr.FromInt32(int32(port))
 			ingress.Ports = append(ingress.Ports, netv1.NetworkPolicyPort{
 				Protocol: &tcp, Port: &p,
 			})
 		}
 		policy.Spec.Ingress = append(policy.Spec.Ingress, ingress)
 		// Local pod, possibly matching the policy.
 		pod := Pod{
 			Namespace: "ns1", Name: "web",
 			Labels:    map[string]string{podSelectorKey: podSelValue, "app": "web"},
 			HostIface: "flock00000001",
 			IPs:       []net.IP{mustIP("2001:db8::1")},
 		}
 		// Peer pod, possibly matching the peer selector.
 		var peers []PeerPod
 		if peerName != "" {
 			peerIPParsed := net.ParseIP(peerIP)
 			if peerIPParsed != nil {
 				peers = append(peers, PeerPod{
 					Namespace: peerNS, Name: peerName,
 					Labels: map[string]string{peerSelectorKey: peerSelV},
 					IPs:    []net.IP{peerIPParsed},
 				})
 			}
 		}
 		out, err := Translate(Inputs{
 			LocalPods: []Pod{pod},
 			PeerPods:  peers,
 			Namespaces: []Namespace{
 				{Name: "ns1", Labels: map[string]string{"kubernetes.io/metadata.name": "ns1"}},
 			},
 			Policies: []netv1.NetworkPolicy{policy},
 		}, func(string) {})
 		if err != nil {
 			return // any error is acceptable
 		}
 		// Property: every isolated PodKey appears in Output.Pods.
 		for iso := range out.Isolated {
 			if _, ok := out.Pods[iso.PodKey]; !ok {
 				t.Fatalf("isolated %s has no Pods entry", iso.PodKey)
 			}
 		}
 		script := Render(out)
 		// Property: balanced braces.
 		if got := strings.Count(script, "{") - strings.Count(script, "}"); got != 0 {
 			t.Fatalf("unbalanced braces (%d):\n%s", got, script)
 		}
 		// Property: deterministic (run again, compare).
 		script2 := Render(out)
 		if script != script2 {
 			t.Fatalf("Render not deterministic")
 		}
 	})
 }
@@ -0,0 +1,452 @@
 package netpol
 import (
 	"net"
 	"testing"
 	corev1 "k8s.io/api/core/v1"
 	netv1 "k8s.io/api/networking/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/util/intstr"
 )
 func mustIP(s string) net.IP {
 	ip := net.ParseIP(s)
 	if ip == nil {
 		panic("bad IP: " + s)
 	}
 	return ip
 }
 func newPolicy(ns, name string, mods ...func(*netv1.NetworkPolicy)) netv1.NetworkPolicy {
 	p := netv1.NetworkPolicy{
 		ObjectMeta: metav1.ObjectMeta{Namespace: ns, Name: name},
 		Spec:       netv1.NetworkPolicySpec{},
 	}
 	for _, m := range mods {
 		m(&p)
 	}
 	return p
 }
 func tcpPort(port int) netv1.NetworkPolicyPort {
 	proto := corev1.ProtocolTCP
 	p := intstr.FromInt32(int32(port))
 	return netv1.NetworkPolicyPort{Protocol: &proto, Port: &p}
 }
 // Pod-only selector that matches everything (`{}`).
 func emptySelector() *metav1.LabelSelector {
 	return &metav1.LabelSelector{}
 }
 func selectorMatching(kv map[string]string) *metav1.LabelSelector {
 	return &metav1.LabelSelector{MatchLabels: kv}
 }
 // Helper: collect Isolated keys for the given pod into a string list.
 func isolationFor(out Output, podKey string) (in, eg bool) {
 	if _, ok := out.Isolated[Isolation{PodKey: podKey, Direction: DirIngress}]; ok {
 		in = true
 	}
 	if _, ok := out.Isolated[Isolation{PodKey: podKey, Direction: DirEgress}]; ok {
 		eg = true
 	}
 	return
 }
 // TestTranslate_NoPolicies — pod with no matching policy is unrestricted.
 func TestTranslate_NoPolicies(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "p1",
 		Labels:    map[string]string{"app": "web"},
 		HostIface: "flock00000001",
 		IPs:       []net.IP{mustIP("2001:db8::1")},
 	}
 	out, err := Translate(Inputs{LocalPods: []Pod{pod}}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(out.Rules) != 0 {
 		t.Fatalf("expected no rules, got %d", len(out.Rules))
 	}
 	in, eg := isolationFor(out, "ns1/p1")
 	if in || eg {
 		t.Fatalf("pod should not be isolated: in=%v eg=%v", in, eg)
 	}
 }
 // TestTranslate_DefaultDeny — a policy with empty Ingress + PolicyTypes
 // = [Ingress] selects the pod and isolates it; no allow rules emitted.
 func TestTranslate_DefaultDenyIngress(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web",
 		Labels:    map[string]string{"app": "web"},
 		HostIface: "flock00000001",
 		IPs:       []net.IP{mustIP("2001:db8::1")},
 	}
 	policy := newPolicy("ns1", "default-deny", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 	})
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{pod},
 		Policies:  []netv1.NetworkPolicy{policy},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(out.Rules) != 0 {
 		t.Fatalf("expected no rules from a deny-all, got %d", len(out.Rules))
 	}
 	in, eg := isolationFor(out, "ns1/web")
 	if !in {
 		t.Fatalf("ingress should be isolated")
 	}
 	if eg {
 		t.Fatalf("egress should NOT be isolated (policy only set ingress)")
 	}
 }
 // TestTranslate_DefaultDenyEgress_InferredFromEgressList — when
 // PolicyTypes is omitted but Spec.Egress is non-empty, egress should
 // also be isolated by inference.
 func TestTranslate_DefaultDenyEgress_InferredFromEgressList(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web",
 		Labels:    map[string]string{"app": "web"},
 		HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
 	}
 	policy := newPolicy("ns1", "egress-rule", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.Egress = []netv1.NetworkPolicyEgressRule{{}}
 	})
 	out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
 	in, eg := isolationFor(out, "ns1/web")
 	if !in || !eg {
 		t.Fatalf("both directions should be isolated: in=%v eg=%v", in, eg)
 	}
 }
 // TestTranslate_PodSelectorPeer_SameNamespace — peer is a single pod in
 // the same namespace, identified by label.
 func TestTranslate_PodSelectorPeer(t *testing.T) {
 	web := Pod{
 		Namespace: "ns1", Name: "web",
 		Labels:    map[string]string{"app": "web"},
 		HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
 	}
 	clientIP := mustIP("2001:db8::2")
 	peer := PeerPod{
 		Namespace: "ns1", Name: "client",
 		Labels: map[string]string{"app": "client"},
 		IPs:    []net.IP{clientIP},
 	}
 	policy := newPolicy("ns1", "allow-from-client", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *selectorMatching(map[string]string{"app": "web"})
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 		p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 			From: []netv1.NetworkPolicyPeer{{
 				PodSelector: selectorMatching(map[string]string{"app": "client"}),
 			}},
 			Ports: []netv1.NetworkPolicyPort{tcpPort(80)},
 		}}
 	})
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{web},
 		PeerPods:  []PeerPod{peer},
 		Policies:  []netv1.NetworkPolicy{policy},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(out.Rules) != 1 {
 		t.Fatalf("expected 1 rule, got %d: %+v", len(out.Rules), out.Rules)
 	}
 	r := out.Rules[0]
 	if r.PodKey != "ns1/web" || r.Direction != DirIngress {
 		t.Fatalf("rule has wrong subject: %+v", r)
 	}
 	if len(r.PeerCIDRs) != 1 || !r.PeerCIDRs[0].IP.Equal(clientIP) {
 		t.Fatalf("peer CIDR wrong: %+v", r.PeerCIDRs)
 	}
 	if len(r.Ports) != 1 || r.Ports[0].Protocol != "tcp" || r.Ports[0].Port != 80 {
 		t.Fatalf("port wrong: %+v", r.Ports)
 	}
 }
 // TestTranslate_NamespaceSelector — peer is "every pod in any namespace
 // with label tier=trusted".
 func TestTranslate_NamespaceSelector(t *testing.T) {
 	web := Pod{
 		Namespace: "ns1", Name: "web",
 		Labels:    map[string]string{"app": "web"},
 		HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
 	}
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{web},
 		Namespaces: []Namespace{
 			{Name: "ns1", Labels: map[string]string{}},
 			{Name: "trusted-1", Labels: map[string]string{"tier": "trusted"}},
 			{Name: "trusted-2", Labels: map[string]string{"tier": "trusted"}},
 			{Name: "untrusted", Labels: map[string]string{"tier": "wild"}},
 		},
 		PeerPods: []PeerPod{
 			{Namespace: "trusted-1", Name: "a", IPs: []net.IP{mustIP("2001:db8::a")}},
 			{Namespace: "trusted-2", Name: "b", IPs: []net.IP{mustIP("2001:db8::b")}},
 			{Namespace: "untrusted", Name: "x", IPs: []net.IP{mustIP("2001:db8::ff")}},
 		},
 		Policies: []netv1.NetworkPolicy{newPolicy("ns1", "allow-trusted", func(p *netv1.NetworkPolicy) {
 			p.Spec.PodSelector = *emptySelector()
 			p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 			p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 				From: []netv1.NetworkPolicyPeer{{
 					NamespaceSelector: selectorMatching(map[string]string{"tier": "trusted"}),
 				}},
 			}}
 		})},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(out.Rules) != 1 {
 		t.Fatalf("expected 1 rule, got %d", len(out.Rules))
 	}
 	got := map[string]bool{}
 	for _, c := range out.Rules[0].PeerCIDRs {
 		got[c.IP.String()] = true
 	}
 	if !got["2001:db8::a"] || !got["2001:db8::b"] {
 		t.Fatalf("trusted pod IPs missing: %v", got)
 	}
 	if got["2001:db8::ff"] {
 		t.Fatalf("untrusted pod IP leaked into rule")
 	}
 }
 // TestTranslate_IPBlockWithExcept — ipBlock with an except range.
 func TestTranslate_IPBlockWithExcept(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web", HostIface: "f1",
 		Labels: map[string]string{"app": "web"},
 		IPs:    []net.IP{mustIP("10.0.0.1")},
 	}
 	policy := newPolicy("ns1", "ipblock", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 		p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 			From: []netv1.NetworkPolicyPeer{{
 				IPBlock: &netv1.IPBlock{
 					CIDR:   "10.0.0.0/8",
 					Except: []string{"10.99.0.0/16", "10.42.42.0/24"},
 				},
 			}},
 		}}
 	})
 	out, err := Translate(Inputs{
 		LocalPods: []Pod{pod},
 		Policies:  []netv1.NetworkPolicy{policy},
 	}, nil)
 	if err != nil {
 		t.Fatal(err)
 	}
 	if len(out.Rules) != 1 {
 		t.Fatalf("expected 1 rule, got %d", len(out.Rules))
 	}
 	r := out.Rules[0]
 	if len(r.PeerCIDRs) != 1 || r.PeerCIDRs[0].String() != "10.0.0.0/8" {
 		t.Fatalf("peer CIDR wrong: %v", r.PeerCIDRs)
 	}
 	if len(r.PeerExcept) != 2 {
 		t.Fatalf("expected 2 except, got %d", len(r.PeerExcept))
 	}
 }
 // TestTranslate_AllowAllPeers — empty From list means "from anywhere".
 func TestTranslate_AllowAllPeers(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web", HostIface: "f1",
 		Labels: map[string]string{"app": "web"},
 		IPs:    []net.IP{mustIP("2001:db8::1")},
 	}
 	policy := newPolicy("ns1", "allow-all-on-port", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 		p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 			Ports: []netv1.NetworkPolicyPort{tcpPort(443)},
 		}}
 	})
 	out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
 	if len(out.Rules) != 1 {
 		t.Fatalf("expected 1 rule, got %d", len(out.Rules))
 	}
 	r := out.Rules[0]
 	if len(r.PeerCIDRs) != 0 || len(r.PeerExcept) != 0 {
 		t.Fatalf("expected allow-all peers, got CIDRs=%v Except=%v", r.PeerCIDRs, r.PeerExcept)
 	}
 }
 // TestTranslate_AllowAllPorts — empty Ports list means "all ports".
 func TestTranslate_AllowAllPorts(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web", HostIface: "f1",
 		Labels: map[string]string{"app": "web"},
 		IPs:    []net.IP{mustIP("2001:db8::1")},
 	}
 	policy := newPolicy("ns1", "allow-from-all", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 		p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 			From: []netv1.NetworkPolicyPeer{{
 				PodSelector: emptySelector(),
 			}},
 		}}
 	})
 	peer := PeerPod{
 		Namespace: "ns1", Name: "x",
 		IPs: []net.IP{mustIP("2001:db8::aa")},
 	}
 	out, _ := Translate(Inputs{
 		LocalPods: []Pod{pod}, PeerPods: []PeerPod{peer},
 		Policies: []netv1.NetworkPolicy{policy},
 	}, nil)
 	if len(out.Rules) != 1 {
 		t.Fatalf("expected 1 rule, got %d", len(out.Rules))
 	}
 	r := out.Rules[0]
 	if len(r.Ports) != 1 || r.Ports[0] != (PortMatch{}) {
 		t.Fatalf("expected single any-port match, got %+v", r.Ports)
 	}
 }
 // TestTranslate_PortRange — endPort field.
 func TestTranslate_PortRange(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web", HostIface: "f1",
 		Labels: map[string]string{"app": "web"},
 		IPs:    []net.IP{mustIP("2001:db8::1")},
 	}
 	policy := newPolicy("ns1", "range", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 		proto := corev1.ProtocolTCP
 		port := intstr.FromInt32(8000)
 		end := int32(8999)
 		p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 			Ports: []netv1.NetworkPolicyPort{{Protocol: &proto, Port: &port, EndPort: &end}},
 		}}
 	})
 	out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
 	if len(out.Rules) != 1 || out.Rules[0].Ports[0].Port != 8000 || out.Rules[0].Ports[0].EndPort != 8999 {
 		t.Fatalf("range not preserved: %+v", out.Rules)
 	}
 }
 // TestTranslate_NamedPortRejected — named ports aren't supported yet;
 // translator must skip the rule and warn.
 func TestTranslate_NamedPortRejected(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web", HostIface: "f1",
 		Labels: map[string]string{"app": "web"},
 		IPs:    []net.IP{mustIP("2001:db8::1")},
 	}
 	proto := corev1.ProtocolTCP
 	named := intstr.FromString("http")
 	policy := newPolicy("ns1", "named", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 		p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 			Ports: []netv1.NetworkPolicyPort{{Protocol: &proto, Port: &named}},
 		}}
 	})
 	var warns []string
 	out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, func(s string) {
 		warns = append(warns, s)
 	})
 	if len(out.Rules) != 0 {
 		t.Fatalf("expected named-port rule to be skipped")
 	}
 	if len(warns) == 0 {
 		t.Fatalf("expected a warning about named ports")
 	}
 	// The pod should still be isolated since the policy selected it.
 	in, _ := isolationFor(out, "ns1/web")
 	if !in {
 		t.Fatalf("pod should be isolated even when its rule is dropped")
 	}
 }
 // TestTranslate_PolicyOnlyAppliesToOwnNamespace — a policy in nsA does
 // NOT select pods in nsB even if their labels match.
 func TestTranslate_PolicyScopedToNamespace(t *testing.T) {
 	a := Pod{Namespace: "nsA", Name: "p", HostIface: "f1",
 		Labels: map[string]string{"app": "web"}, IPs: []net.IP{mustIP("2001:db8::1")}}
 	b := Pod{Namespace: "nsB", Name: "p", HostIface: "f2",
 		Labels: map[string]string{"app": "web"}, IPs: []net.IP{mustIP("2001:db8::2")}}
 	policy := newPolicy("nsA", "deny", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *selectorMatching(map[string]string{"app": "web"})
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 	})
 	out, _ := Translate(Inputs{LocalPods: []Pod{a, b}, Policies: []netv1.NetworkPolicy{policy}}, nil)
 	inA, _ := isolationFor(out, "nsA/p")
 	inB, _ := isolationFor(out, "nsB/p")
 	if !inA {
 		t.Fatalf("nsA/p should be isolated")
 	}
 	if inB {
 		t.Fatalf("nsB/p must NOT be isolated by a policy in nsA")
 	}
 }
 // TestTranslate_PodWithoutAllocationSkipped — pod with no IPs is silently
 // skipped (its rule could not match any traffic anyway).
 func TestTranslate_PodWithoutAllocationSkipped(t *testing.T) {
 	pod := Pod{Namespace: "ns1", Name: "p", HostIface: "f1",
 		Labels: map[string]string{"app": "web"}}
 	policy := newPolicy("ns1", "deny", func(p *netv1.NetworkPolicy) {
 		p.Spec.PodSelector = *emptySelector()
 		p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 	})
 	out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
 	in, _ := isolationFor(out, "ns1/p")
 	if in {
 		t.Fatalf("pod without IP should not appear in output")
 	}
 }
 // TestTranslate_Determinism — translating the same Inputs twice produces
 // equal outputs (Rules in equal order, Isolated equal).
 func TestTranslate_Determinism(t *testing.T) {
 	pod := Pod{
 		Namespace: "ns1", Name: "web", HostIface: "f1",
 		Labels: map[string]string{"app": "web"},
 		IPs:    []net.IP{mustIP("2001:db8::1")},
 	}
 	peers := []PeerPod{
 		{Namespace: "ns1", Name: "z", Labels: map[string]string{"app": "client"}, IPs: []net.IP{mustIP("2001:db8::2")}},
 		{Namespace: "ns1", Name: "a", Labels: map[string]string{"app": "client"}, IPs: []net.IP{mustIP("2001:db8::3")}},
 	}
 	policies := []netv1.NetworkPolicy{
 		newPolicy("ns1", "z-second", func(p *netv1.NetworkPolicy) {
 			p.Spec.PodSelector = *emptySelector()
 			p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
 			p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
 				From: []netv1.NetworkPolicyPeer{{
 					PodSelector: selectorMatching(map[string]string{"app": "client"}),
 				}},
 			}}
 		}),
 	}
 	in := Inputs{LocalPods: []Pod{pod}, PeerPods: peers, Policies: policies}
 	a, _ := Translate(in, nil)
 	b, _ := Translate(in, nil)
 	if len(a.Rules) != len(b.Rules) {
 		t.Fatalf("rule count differs: %d vs %d", len(a.Rules), len(b.Rules))
 	}
 	for i := range a.Rules {
 		if a.Rules[i].PodKey != b.Rules[i].PodKey || len(a.Rules[i].PeerCIDRs) != len(b.Rules[i].PeerCIDRs) {
 			t.Fatalf("rule[%d] differs", i)
 		}
 	}
 }
@@ -0,0 +1,147 @@
 package netpol
 import "net"
 // Direction is the NetworkPolicy direction, named from the *pod's*
 // perspective (matching the NetworkPolicy API). "Ingress" is traffic
 // arriving at the pod; "Egress" is traffic the pod initiates.
 //
 // Note that on the host this maps the opposite way at the veth: an
 // Ingress rule matches packets whose oifname is the pod's host-side veth
 // (the kernel is forwarding into the pod), and an Egress rule matches
 // packets whose iifname is the pod's host-side veth (the kernel just
 // received from the pod).
 type Direction int
 const (
 	DirIngress Direction = iota
 	DirEgress
 )
 // String returns the lower-case wire form ("ingress" / "egress").
 func (d Direction) String() string {
 	if d == DirEgress {
 		return "egress"
 	}
 	return "ingress"
 }
 // Pod is the local-pod information the translator needs. The reconciler
 // populates this from its store of CNI allocations — every pod with a
 // committed allocation on this node appears here.
 type Pod struct {
 	// Namespace + Name uniquely identify the pod.
 	Namespace string
 	Name      string
 	// Labels are the pod labels. NetworkPolicy.Spec.PodSelector matches
 	// against these.
 	Labels map[string]string
 	// HostIface is the host-side veth name (e.g. "flock1a2b3c4d"). All
 	// rules guarding this pod hook off iifname/oifname == HostIface.
 	HostIface string
 	// IPs are the pod's eth0 addresses (IPv6 and/or IPv4). Empty means
 	// the agent has no allocation for this pod yet — translator should
 	// skip such pods.
 	IPs []net.IP
 }
 // PeerPod is a (potentially remote) pod whose IPs may be referenced as a
 // NetworkPolicy peer. The translator resolves podSelector +
 // namespaceSelector peers to their IPs by walking the cluster-wide
 // peer-pod set.
 type PeerPod struct {
 	Namespace string
 	Name      string
 	Labels    map[string]string
 	IPs       []net.IP
 }
 // Namespace carries just enough metadata for namespaceSelector matching.
 type Namespace struct {
 	Name   string
 	Labels map[string]string
 }
 // LocalPod is the renderer-visible subset of a local pod — just enough
 // to anchor a base-chain jump. Carried in Output so the renderer can
 // emit chains for default-deny pods that have no explicit allow rules.
 type LocalPod struct {
 	PodKey    string
 	HostIface string
 	IPs       []net.IP
 }
 // PortMatch is one allowed (protocol, port) tuple. EndPort is inclusive;
 // when zero the rule matches the single Port.
 type PortMatch struct {
 	Protocol string // "tcp", "udp", "sctp"; empty means "any of the three"
 	Port     int    // 1..65535. Zero means "any port".
 	EndPort  int    // 0 if not a range; otherwise inclusive range end.
 }
 // Rule is the canonical intermediate representation between the translator
 // and the renderer. One Rule is one accept-line in the rendered nft
 // script. A pod's chain is the ordered concatenation of every Rule whose
 // PodKey matches; any packet that falls off the end is denied by the
 // trailing default-deny verdict (the chain has policy drop).
 //
 // PeerCIDRs are OR'd together, then PeerExcept is subtracted. Empty
 // PeerCIDRs + empty PeerExcept means "any source/destination".
 type Rule struct {
 	// PodKey is namespace/name of the pod this rule guards. Used by the
 	// renderer to slot the rule into the correct chain.
 	PodKey string
 	// HostIface is the pod's host-side veth name; the renderer uses it
 	// to anchor the base-chain jump.
 	HostIface string
 	// PodIPs are the pod's eth0 addresses. The base chain matches on
 	// (oifname == HostIface AND daddr ∈ PodIPs) for ingress, and
 	// (iifname == HostIface AND saddr ∈ PodIPs) for egress, so packets
 	// that aren't destined to / from the actual pod address don't get
 	// counted as policy-protected.
 	PodIPs []net.IP
 	// Direction is Ingress or Egress, named from the pod's perspective.
 	Direction Direction
 	// Action is "accept" for explicit allows; default-deny is implicit
 	// in the chain's policy drop and is not represented as a Rule.
 	// (Reserved for future deny-list semantics like AdminNetworkPolicy.)
 	Action Action
 	// PeerCIDRs are the addresses of allowed peers. OR'd together.
 	// Empty means "any peer".
 	PeerCIDRs []*net.IPNet
 	// PeerExcept narrows PeerCIDRs by subtracting these ranges. Only
 	// meaningful with non-empty PeerCIDRs (it comes from
 	// ipBlock.except, which requires ipBlock.cidr).
 	PeerExcept []*net.IPNet
 	// Ports is the set of allowed (protocol, port) tuples. Empty means
 	// "any port / any protocol".
 	Ports []PortMatch
 }
 // Action is the verdict emitted by a Rule.
 type Action int
 const (
 	// ActionAccept lets the packet through. The default-deny is implicit
 	// in the chain policy.
 	ActionAccept Action = iota
 	// ActionDrop is reserved for future use (AdminNetworkPolicy /
 	// BaselineAdminNetworkPolicy explicit denies). Not produced by the
 	// v1 translator.
 	ActionDrop
 )
 // String returns the nft-syntax verdict.
 func (a Action) String() string {
 	if a == ActionDrop {
 		return "drop"
 	}
 	return "accept"
 }
@@ -0,0 +1,56 @@
 package agent
 import (
 	"net"
 	"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
 )
 // collectLocalPods bridges the agent's allocation store + pod informer
 // cache into the netpol-package input shape. It returns one Pod per
 // committed allocation that has a matching pod in the informer cache;
 // allocations whose pod was just deleted (DEL race) are skipped.
 //
 // Called on every netpol reconcile pass, so it must be cheap. The work
 // here is O(allocations) and reads from in-memory maps only.
 func collectLocalPods(store *Store, pods *PodCache) []netpol.Pod {
 	allocs := store.Snapshot()
 	out := make([]netpol.Pod, 0, len(allocs))
 	for _, a := range allocs {
 		if a.State != StateCommitted {
 			continue
 		}
 		pod, ok := pods.Get(a.Namespace, a.PodName)
 		if !ok {
 			// Pod evicted but DEL hasn't fired yet; nothing to enforce.
 			continue
 		}
 		ips := allocationIPs(a)
 		if len(ips) == 0 {
 			continue
 		}
 		out = append(out, netpol.Pod{
 			Namespace: a.Namespace,
 			Name:      a.PodName,
 			Labels:    pod.Labels,
 			HostIface: HostIfaceName(a.ContainerID),
 			IPs:       ips,
 		})
 	}
 	return out
 }
 func allocationIPs(a Allocation) []net.IP {
 	var out []net.IP
 	if a.IP6 != "" {
 		if ip := net.ParseIP(a.IP6); ip != nil {
 			out = append(out, ip)
 		}
 	}
 	if a.IP4 != "" {
 		if ip := net.ParseIP(a.IP4); ip != nil {
 			out = append(out, ip)
 		}
 	}
 	return out
 }
@@ -2,25 +2,37 @@ package agent
 import (
 	"context"
 	"encoding/json"
 	"fmt"
 	"log/slog"
 	"time"
 	corev1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/types"
 	"k8s.io/client-go/kubernetes"
 	"k8s.io/client-go/rest"
 )
 // fieldManager identifies flock-agent in apiserver field-manager bookkeeping.
 // Server-Side Apply only takes ownership of the fields we send, so other
 // managers (kubelet, kcm) keep their conditions untouched between our writes.
 const nodeStatusFieldManager = "flock-agent"
 // keepNetworkAvailable maintains a NetworkUnavailable=False condition on
 // the node's status. Calico-node sets this False while it owns CNI; on
 // shutdown it sets it to True with reason CalicoIsDown, which adds the
 // node.kubernetes.io/network-unavailable taint and blocks new scheduling.
-// Once flock-agent is in charge, we own the condition.
+// Once flock-agent is in charge, we own that single condition.
 //
-// Re-applies every minute — heartbeat-style — so a stale condition from a
+// Uses Server-Side Apply against the status subresource. NodeStatus.Conditions
 // is a listType=map keyed by `type`, so SSA merges by type — our partial body
 // declares ownership of just the NetworkUnavailable entry and leaves the
 // kubelet-managed conditions (Ready, MemoryPressure, DiskPressure, PIDPressure)
 // alone. A prior implementation used JSON merge-patch with a one-element
 // conditions array, which the apiserver REPLACES (merge-patch on arrays is
 // whole-array semantics) — that race-stripped the kubelet conditions every
 // 60s and produced ~5s flickers in `kubectl get nodes`.
 //
 // Re-applies every minute (heartbeat-style) so a stale condition from a
 // previous CNI is overwritten without an explicit transition.
 func keepNetworkAvailable(ctx context.Context, cfg *rest.Config, node string, logger *slog.Logger) {
 	cs, err := kubernetes.NewForConfig(cfg)
@@ -29,23 +41,29 @@ func keepNetworkAvailable(ctx context.Context, cfg *rest.Config, node string, lo
 		return
 	}
 	apply := func() {
-		now := metav1.Now()
+		now := metav1.Now().UTC().Format(time.RFC3339)
-		patch := map[string]interface{}{
+		// Hand-build the SSA body so we only declare the fields we own.
-			"status": map[string]interface{}{
+		// Force=true lets us reclaim the condition if a previous CNI's
-				"conditions": []corev1.NodeCondition{{
+		// finalizer/cleanup left it owned by a different manager.
-					Type:               corev1.NodeNetworkUnavailable,
+		body := []byte(fmt.Sprintf(`{
-					Status:             corev1.ConditionFalse,
+  "apiVersion": "v1",
-					Reason:             "FlockReady",
+  "kind": "Node",
-					Message:            "flock-agent owns CNI on this node",
+  "metadata": {"name": %q},
-					LastHeartbeatTime:  now,
+  "status": {"conditions": [{
-					LastTransitionTime: now,
+    "type": "NetworkUnavailable",
-				}},
+    "status": "False",
-			},
+    "reason": "FlockReady",
-		}
+    "message": "flock-agent owns CNI on this node",
-		body, _ := json.Marshal(patch)
+    "lastHeartbeatTime": %q,
-		_, err := cs.CoreV1().Nodes().Patch(ctx, node, types.MergePatchType, body, metav1.PatchOptions{}, "status")
+    "lastTransitionTime": %q
  }]}
 }`, node, now, now))
 		force := true
 		_, err := cs.CoreV1().Nodes().Patch(ctx, node, types.ApplyPatchType, body,
 			metav1.PatchOptions{FieldManager: nodeStatusFieldManager, Force: &force},
 			"status")
 		if err != nil {
-			logger.Warn("network-condition: patch failed", "err", err)
+			logger.Warn("network-condition: ssa apply failed", "err", err)
 			return
 		}
 	}
@@ -61,6 +79,3 @@ func keepNetworkAvailable(ctx context.Context, cfg *rest.Config, node string, lo
 		}
 	}
 }
 // silence unused-import warnings on non-Linux builds where this is unused.
 var _ = fmt.Sprintf
@@ -28,6 +28,16 @@ func podReady(pod *corev1.Pod) bool {
 	return false
 }
 // podAnycastEligible reports whether a pod should contribute its IP as a
 // nexthop for its anycast IPs. A pod is eligible when it is Ready AND not
 // being deleted. Once the apiserver sets DeletionTimestamp, kubelet has
 // started teardown — kube-proxy will keep routing for terminationGracePeriod
 // but the pod is on the way out; we should withdraw the nexthop immediately
 // so BGP shifts traffic to a sibling before the pod actually exits.
 func podAnycastEligible(pod *corev1.Pod) bool {
 	return pod.DeletionTimestamp == nil && podReady(pod)
 }
 // PodCache exposes a Get(ns, name) lookup against a node-scoped Pod
 // informer. ADD/DEL handlers consult it to read annotations + labels for
 // IPAM and (later) NetworkPolicy. Callers can subscribe to Ready
@@ -58,7 +68,7 @@ func StartPodInformer(ctx context.Context, cfg *rest.Config, node string, logger
 	_, _ = inf.AddEventHandler(cache.ResourceEventHandlerFuncs{
 		AddFunc: func(obj interface{}) {
-			if pod, ok := obj.(*corev1.Pod); ok && podReady(pod) {
+			if pod, ok := obj.(*corev1.Pod); ok && podAnycastEligible(pod) {
 				pc.fireReady()
 			}
 		},
@@ -68,7 +78,10 @@ func StartPodInformer(ctx context.Context, cfg *rest.Config, node string, logger
 			if oldP == nil || newP == nil {
 				return
 			}
-			if podReady(oldP) != podReady(newP) {
+			// Fire on Ready transition OR DeletionTimestamp transition.
 			// The latter catches "pod was Ready, now being deleted" so the
 			// reconciler withdraws the nexthop before the pod actually exits.
 			if podAnycastEligible(oldP) != podAnycastEligible(newP) {
 				pc.fireReady()
 			}
 		},
@@ -0,0 +1,46 @@
 package agent
 import (
 	"testing"
 	corev1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )
 func readyPod(deletionTimestamp *metav1.Time) *corev1.Pod {
 	return &corev1.Pod{
 		ObjectMeta: metav1.ObjectMeta{DeletionTimestamp: deletionTimestamp},
 		Status: corev1.PodStatus{
 			Conditions: []corev1.PodCondition{
 				{Type: corev1.PodReady, Status: corev1.ConditionTrue},
 			},
 		},
 	}
 }
 func TestPodAnycastEligible(t *testing.T) {
 	now := metav1.Now()
 	cases := []struct {
 		name string
 		pod  *corev1.Pod
 		want bool
 	}{
 		{"ready, not deleting", readyPod(nil), true},
 		{"ready, but deleting", readyPod(&now), false},
 		{
 			"not ready, not deleting",
 			&corev1.Pod{Status: corev1.PodStatus{Conditions: []corev1.PodCondition{
 				{Type: corev1.PodReady, Status: corev1.ConditionFalse},
 			}}},
 			false,
 		},
 		{"no conditions, not deleting", &corev1.Pod{}, false},
 	}
 	for _, c := range cases {
 		t.Run(c.name, func(t *testing.T) {
 			if got := podAnycastEligible(c.pod); got != c.want {
 				t.Fatalf("got %v want %v", got, c.want)
 			}
 		})
 	}
 }
@@ -6,9 +6,36 @@ import (
 	"context"
 	"fmt"
 	"net"
 	"os"
 	"time"
 	"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
 )
 // hostMultipathHashSysctls is the set of node-level sysctls flock-agent
 // best-effort writes at startup. Default policy 0 hashes only on
 // (saddr, daddr); policy 1 adds L4 (sport, dport, proto), giving real
 // per-connection ECMP across multipath nexthops — required for sensible
 // distribution across multiple anycast pods on the same node.
 var hostMultipathHashSysctls = map[string]string{
 	"/proc/sys/net/ipv4/fib_multipath_hash_policy": "1",
 	"/proc/sys/net/ipv6/fib_multipath_hash_policy": "1",
 }
 // applyHostSysctls writes the sysctls in m, logging but not failing on
 // errors. flock-agent is privileged so this works in the production
 // DaemonSet; in environments where it doesn't, single-pod-per-node
 // anycast still works (this only affects the multi-pod-per-node case).
 func applyHostSysctls(s *Server) {
 	for path, value := range hostMultipathHashSysctls {
 		if err := os.WriteFile(path, []byte(value), 0o644); err != nil {
 			s.Logger.Warn("set host sysctl", "path", path, "value", value, "err", err)
 			continue
 		}
 		s.Logger.Info("host sysctl set", "path", path, "value", value)
 	}
 }
 // configureRuntime wires Pod informer, IPAM, netlink, and BIRD on a real
 // Linux node. Steps:
 //
@@ -21,6 +48,8 @@ import (
 //  5. Build PodHandler and SetHandlers(add, del, check).
 //  6. Install BIRD blackhole summary routes + render initial config.
 func (s *Server) configureRuntime(ctx context.Context) error {
 	applyHostSysctls(s)
 	if err := s.firstAvailableNodeConfig(ctx, 60*time.Second); err != nil {
 		return err
 	}
@@ -103,15 +132,32 @@ func (s *Server) configureRuntime(ctx context.Context) error {
 		}
 	}()
 	// NetworkPolicy enforcement.
 	world := netpol.NewWorld(s.Logger)
 	if err := world.Start(ctx, s.restCfg); err != nil {
 		return fmt.Errorf("netpol informers: %w", err)
 	}
 	npApplier := &netpol.Applier{}
 	npReconciler := netpol.NewReconciler(world, func() []netpol.Pod {
 		return collectLocalPods(s.Store, pods)
 	}, npApplier, s.Logger)
 	go npReconciler.Run(ctx)
 	handler := &PodHandler{
 		Node:         s.Node,
 		Store:        s.Store,
 		IPAM:         ipam,
 		Pods:         pods,
 		NodeConfig:   s.NodeConfig,
 		Logger:       s.Logger,
 		SetupFunc:    Setup,
 		TeardownFunc: Teardown,
-		AfterCommit:  anycast.Trigger,
+		AfterCommit: func() {
 			anycast.Trigger()
 			// Re-evaluate policy on every CNI ADD/DEL so a brand-new
 			// pod's chain lands before its first packet egresses.
 			npReconciler.Trigger()
 		},
 	}
 	s.RPC.SetHandlers(handler.Add, handler.Del, handler.Check)
 	s.Logger.Info("runtime ready",
@@ -1,6 +1,6 @@
-// Package agent owns the in-process flock-agent runtime: IPAM, netns, state,
+// This file implements the durable per-node allocation file at
-// anycast, and NetworkPolicy. This file implements the durable per-node
+// /var/lib/flock/allocations.json. The package-level doc lives in doc.go.
-// allocation file at /var/lib/flock/allocations.json.
+
 package agent
 import (
@@ -33,6 +33,7 @@ type Allocation struct {
 	IP6         string          `json:"ip6,omitempty"`
 	IP4         string          `json:"ip4,omitempty"`
 	Anycast     []string        `json:"anycast,omitempty"`
 	Addresses   []string        `json:"addresses,omitempty"`
 	State       AllocationState `json:"state"`
 	AllocatedAt time.Time       `json:"allocated_at"`
 }
@@ -1,3 +1,8 @@
 // Package v1alpha1 contains the operator-facing API types for flock.
 //
 // Stability: alpha. The shape of these types may change in incompatible ways
 // between minor releases. CRDs are versioned and the agent reads only its
 // pinned version.
 package v1alpha1
 import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@ -6,26 +11,77 @@ import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 //
 // The agent reads this on startup and via informer for live updates. There is
 // no controller and no auto-allocation — purely declarative input.
 //
 // A NodeConfig's name MUST equal the Kubernetes node name it configures
 // (NodeConfigs are cluster-scoped). The agent ignores all NodeConfigs whose
 // name does not match its own node.
 type NodeConfigSpec struct {
 	// CIDR6 is the set of IPv6 CIDRs this node owns and advertises as BGP
-	// aggregates. Pod IPv6 addresses are allocated from these.
+	// aggregates. Pod IPv6 addresses are allocated from these. May be empty
 	// only if Defaults disables IPv6 for every pod on this node.
 	CIDR6 []string `json:"cidr6,omitempty"`
 	// CIDR4 is the set of IPv4 CIDRs this node owns and advertises as BGP
-	// aggregates. Pod IPv4 addresses are allocated from these.
+	// aggregates. Pod IPv4 addresses are allocated from these. May be empty
 	// when no pod on this node ever opts into IPv4.
 	CIDR4 []string `json:"cidr4,omitempty"`
 	// BGP configures the BGP sessions this node establishes upstream.
 	BGP BGPSpec `json:"bgp"`
 	// Defaults sets the per-node baseline for which address families a pod
 	// receives when its own annotations don't say. Pod-level
 	// `flock.fritzlab.net/ipv6` and `flock.fritzlab.net/ipv4` annotations
 	// always override these defaults.
 	//
 	// When a field is unset (nil), the agent falls back to its built-in
 	// baseline of IPv6=true, IPv4=true (dual-stack). When the whole Defaults
 	// block is nil, both built-in defaults apply.
 	//
 	// Typical uses:
 	//   - dual-stack node (built-in default): omit Defaults entirely.
 	//   - IPv6-only node:  Defaults: { ipv6: true,  ipv4: false }
 	//   - IPv4-only node:  Defaults: { ipv6: false, ipv4: true  }
 	//
 	// Validation: at least one of IPv6 or IPv4 must end up true after merging
 	// (annotations + defaults + built-in baseline). The agent rejects pods
 	// that resolve to neither.
 	Defaults *FamilyDefaults `json:"defaults,omitempty"`
 }
 // FamilyDefaults is the per-node default for which address families a pod
 // receives when its annotations don't specify. Each field is a pointer so
 // "unset" is distinguishable from explicit "false".
 type FamilyDefaults struct {
 	// IPv6 is the default value for the `flock.fritzlab.net/ipv6` annotation.
 	// nil → fall back to the built-in baseline (true).
 	IPv6 *bool `json:"ipv6,omitempty"`
 	// IPv4 is the default value for the `flock.fritzlab.net/ipv4` annotation.
 	// nil → fall back to the built-in baseline (true).
 	IPv4 *bool `json:"ipv4,omitempty"`
 }
 // BGPSpec describes this node's BGP speaker configuration. Each upstream peer
 // becomes one BGP session in the rendered bird.conf.
 type BGPSpec struct {
 	// ASN is this node's local autonomous system number. flock uses private
 	// ASNs in the 64512-65534 range by convention but accepts any value.
 	ASN uint32 `json:"asn"`
 	// Peers is the set of upstream BGP neighbors. At least one is required
 	// for BGP advertisement to function. Multiple peers of the same family
 	// are allowed (multi-homing).
 	Peers []BGPPeer `json:"peers"`
 }
 // BGPPeer is a single upstream BGP neighbor.
 type BGPPeer struct {
 	// Address is the peer's IP. May be IPv4 or IPv6. The agent picks an
 	// appropriate local source address on the same subnet.
 	Address string `json:"address"`
 	// ASN is the peer's remote ASN.
 	ASN uint32 `json:"asn"`
 }
@@ -1,5 +1,5 @@
-// Package embed implements ip-algo: deterministic embedding of pod identity
+// Package embed implements ip-algo: deterministic embedding of workload
-// (namespace, pod name, image digest) into the host portion of an IPv6
+// identity (namespace, app name, image) into the host portion of an IPv6
 // address. The mapping is operator-friendly cosmetics — NOT a security
 // boundary. See dfritz-cni.md "IPv6 IID Embedding" for the full spec.
 package embed
@@ -17,17 +17,26 @@ type Field string
 const (
 	FieldNamespace Field = "namespace"
-	FieldPod       Field = "pod"
+	FieldApp       Field = "app"
 	FieldImage     Field = "image"
 )
-// Values carries the inputs for one embedding call. Image holds the SHA-256
+// Values carries the inputs for one embedding call.
-// manifest digest as 64 hex chars when known; otherwise pass the containerID
+//
-// in ImageFallback and we'll FNV-1a-64 it.
+// App is the stable workload identifier — typically the owning Deployment /
 // StatefulSet / DaemonSet name (callers strip the pod-template-hash from
 // ReplicaSet names before passing it in). Caller is responsible for picking
 // the right level of stability; this package just hashes whatever it gets.
 //
 // Image is whatever string the caller wants embedded for the image field;
 // the most common choice is pod.Spec.Containers[0].Image (the spec'd
 // reference). If the caller passes a 64-hex-char SHA-256 digest, the top
 // bits are taken as a hex value; otherwise it is FNV-1a-64'd as a plain
 // string. ImageFallback is used only when Image == "".
 type Values struct {
 	Namespace     string
-	Pod             string
+	App           string
-	Image           string // 64-char hex sha256 manifest digest, or empty
+	Image         string // sha256 hex (64 chars), or any string to FNV; empty → fallback
 	ImageFallback string // typically containerID, used when Image=="".
 }
@@ -127,13 +136,22 @@ func fieldValue(f Field, v Values, bits int) (uint64, error) {
 	switch f {
 	case FieldNamespace:
 		return topBitsFNV(v.Namespace, bits), nil
-	case FieldPod:
+	case FieldApp:
-		return topBitsFNV(v.Pod, bits), nil
+		return topBitsFNV(v.App, bits), nil
 	case FieldImage:
-		if v.Image != "" {
+		if v.Image == "" {
 			return topBitsFNV(v.ImageFallback, bits), nil
 		}
 		// SHA-256 manifest digests are exactly 64 hex chars (with optional
 		// "sha256:" prefix). Anything else — image:tag references like
 		// "traefik:v3", or short SHAs — gets FNV-1a-64'd as a string. This
 		// preserves the original digest behaviour while letting callers
 		// pass pod.Spec.Containers[0].Image directly.
 		s := strings.TrimPrefix(v.Image, "sha256:")
 		if len(s) == 64 && isHex(s) {
 			return topBitsHex(v.Image, bits)
 		}
-		return topBitsFNV(v.ImageFallback, bits), nil
+		return topBitsFNV(v.Image, bits), nil
 	default:
 		return 0, fmt.Errorf("unknown field %q", f)
 	}
@@ -163,6 +181,21 @@ func topBitsHex(s string, bits int) (uint64, error) {
 	return v >> uint(64-bits), nil
 }
 // isHex reports whether every byte in s is a valid hex digit.
 func isHex(s string) bool {
 	for i := 0; i < len(s); i++ {
 		c := s[i]
 		switch {
 		case c >= '0' && c <= '9':
 		case c >= 'a' && c <= 'f':
 		case c >= 'A' && c <= 'F':
 		default:
 			return false
 		}
 	}
 	return true
 }
 // writeNibble sets the (nibIdx)-th nibble of addr (0 = highest nibble of byte 0).
 func writeNibble(addr net.IP, nibIdx int, nb byte) {
 	bytePos := nibIdx / 2
@@ -0,0 +1,104 @@
 package embed
 import (
 	"net"
 	"testing"
 )
 // FuzzEmbed verifies that Embed never panics and that any successful return
 // keeps the output address inside the requested network.
 func FuzzEmbed(f *testing.F) {
 	type seed struct {
 		prefix    string
 		fields    string // comma-separated, mapped below to []Field
 		ns, app   string
 		image     string
 		fallback  string
 		nNibble   byte
 	}
 	for _, s := range []seed{
 		{"2602:817:3000:f001::/64", "namespace,app,image", "mail", "stalwart", "", "ctr", 0xe},
 		{"2001:db8::/64", "namespace", "ns", "a", "", "", 0},
 		{"2001:db8::/96", "app", "", "appname", "", "ctr", 0xf},
 		{"2001:db8::/48", "namespace,app", "ns", "a", "", "ctr", 0x1},
 		{"2001:db8::/120", "namespace", "n", "a", "", "ctr", 0x0},   // 8 host nibbles
 		{"2001:db8::/124", "namespace", "n", "a", "", "ctr", 0x0},   // 4 host nibbles
 		{"2001:db8::/127", "namespace", "n", "a", "", "ctr", 0x0},   // not nibble-aligned
 		{"2001:db8::/63", "namespace", "n", "a", "", "ctr", 0x0},    // not nibble-aligned
 		{"2001:db8::/64", "namespace,app,image", "", "", "sha256:abcdef0123456789aabbccddeeff00112233445566778899aabbccddeeff0011", "", 0xa},
 		{"2001:db8::/64", "namespace,app,image", "", "", "traefik:v3.5", "ctr", 0xa},
 		{"2001:db8::/64", "namespace,app,image", "", "", "", "ctr", 0xa},
 		{"2001:db8::/64", "namespace", "🦆", "🐧", "", "", 0},
 		{"2001:db8::/64", "namespace", "ns\x00\x00", "a", "", "", 0},
 	} {
 		f.Add(s.prefix, s.fields, s.ns, s.app, s.image, s.fallback, s.nNibble)
 	}
 	f.Fuzz(func(t *testing.T, prefix, fieldsStr, ns, app, image, fallback string, nNibble byte) {
 		_, network, err := net.ParseCIDR(prefix)
 		if err != nil {
 			return
 		}
 		fields, ok := decodeFields(fieldsStr)
 		if !ok {
 			return
 		}
 		got, err := Embed(network, fields, Values{
 			Namespace:     ns,
 			App:           app,
 			Image:         image,
 			ImageFallback: fallback,
 		}, nNibble)
 		if err != nil {
 			return
 		}
 		if !network.Contains(got) {
 			t.Fatalf("Embed(%s, %v) = %s, outside network", prefix, fields, got)
 		}
 		// Property: low nibble of last byte equals nNibble & 0x0F.
 		if want := nNibble & 0x0F; got[len(got)-1]&0x0F != want {
 			t.Fatalf("low nibble = %x, want %x", got[len(got)-1]&0x0F, want)
 		}
 	})
 }
 func decodeFields(s string) ([]Field, bool) {
 	if s == "" {
 		return nil, false
 	}
 	var out []Field
 	cur := []byte{}
 	flush := func() bool {
 		if len(cur) == 0 {
 			return true
 		}
 		switch string(cur) {
 		case string(FieldNamespace):
 			out = append(out, FieldNamespace)
 		case string(FieldApp):
 			out = append(out, FieldApp)
 		case string(FieldImage):
 			out = append(out, FieldImage)
 		default:
 			return false
 		}
 		cur = cur[:0]
 		return true
 	}
 	for i := 0; i < len(s); i++ {
 		if s[i] == ',' {
 			if !flush() {
 				return nil, false
 			}
 			continue
 		}
 		cur = append(cur, s[i])
 	}
 	if !flush() {
 		return nil, false
 	}
 	if len(out) == 0 {
 		return nil, false
 	}
 	return out, true
 }
@@ -70,8 +70,8 @@ func TestEmbed_Slash64Deterministic(t *testing.T) {
 	// /64 with 3 fields: 5+5+5+1 nibbles = 64-bit IID.
 	net64 := mustCIDR(t, "2602:817:3000:f001::/64")
 	addr, err := Embed(net64,
-		[]Field{FieldNamespace, FieldPod, FieldImage},
+		[]Field{FieldNamespace, FieldApp, FieldImage},
-		Values{Namespace: "mail", Pod: "stalwart-0", ImageFallback: "container-abc"},
+		Values{Namespace: "mail", App: "stalwart", ImageFallback: "container-abc"},
 		0xe,
 	)
 	if err != nil {
@@ -79,8 +79,8 @@ func TestEmbed_Slash64Deterministic(t *testing.T) {
 	}
 	// Property: same inputs → same output (twice).
 	addr2, err := Embed(net64,
-		[]Field{FieldNamespace, FieldPod, FieldImage},
+		[]Field{FieldNamespace, FieldApp, FieldImage},
-		Values{Namespace: "mail", Pod: "stalwart-0", ImageFallback: "container-abc"},
+		Values{Namespace: "mail", App: "stalwart", ImageFallback: "container-abc"},
 		0xe,
 	)
 	if err != nil {
@@ -101,8 +101,8 @@ func TestEmbed_Slash64Deterministic(t *testing.T) {
 func TestEmbed_DifferentInputsDifferentOutputs(t *testing.T) {
 	net64 := mustCIDR(t, "2602:817:3000:f001::/64")
-	a, _ := Embed(net64, []Field{FieldNamespace, FieldPod}, Values{Namespace: "ns1", Pod: "p1"}, 0)
+	a, _ := Embed(net64, []Field{FieldNamespace, FieldApp}, Values{Namespace: "ns1", App: "p1"}, 0)
-	b, _ := Embed(net64, []Field{FieldNamespace, FieldPod}, Values{Namespace: "ns2", Pod: "p1"}, 0)
+	b, _ := Embed(net64, []Field{FieldNamespace, FieldApp}, Values{Namespace: "ns2", App: "p1"}, 0)
 	if a.Equal(b) {
 		t.Fatalf("different namespace produced identical IID: %s", a)
 	}
@@ -9,6 +9,7 @@ import (
 	"fmt"
 	"net"
 	"sort"
 	"strings"
 	"text/template"
 )
@@ -25,6 +26,14 @@ type NodeBGP struct {
 	// hop self that crt001 accepts).
 	LocalV6 string
 	LocalV4 string
 	// LocalSubnetV6 / LocalSubnetV4 are the directly-connected subnets
 	// (CIDR) the BGP peers live on. When set, the per-peer ipv6 / ipv4
 	// channel uses `import where net != <subnet>` so the gateway can't
 	// re-advertise our own connected /64 (or /24) back to us — accepting
 	// it would override the kernel-connected route and hairpin all
 	// inter-host traffic via the gateway.
 	LocalSubnetV6 string
 	LocalSubnetV4 string
 	// CIDR6 / CIDR4 are the per-node summary aggregates the agent wants
 	// advertised. The agent installs blackhole kernel routes for each so
 	// BIRD's protocol kernel imports them.
@@ -91,7 +100,7 @@ protocol bgp upstream6_{{$i}} {
  neighbor {{$p.Address}} as {{$p.ASN}};
  graceful restart;
  ipv6 {
-    import all;
+    {{if $.LocalSubnetV6}}import where net != {{$.LocalSubnetV6}};{{else}}import all;{{end}}
    next hop self;
    export filter {
      {{range $cidr := $.CIDR6}}if net = {{$cidr}} then accept;
@@ -106,7 +115,7 @@ protocol bgp upstream4_{{$i}} {
  neighbor {{$p.Address}} as {{$p.ASN}};
  graceful restart;
  ipv4 {
-    import all;
+    {{if $.LocalSubnetV4}}import where net != {{$.LocalSubnetV4}};{{else}}import all;{{end}}
    next hop self;
    export filter {
      {{range $cidr := $.CIDR4}}if net = {{$cidr}} then accept;
@@ -118,28 +127,181 @@ protocol bgp upstream4_{{$i}} {
 {{end}}{{end}}`
 // Render produces the bird.conf text.
 //
 // The output is deterministic: the same NodeBGP input always produces the
 // same string. CIDR lists, anycast lists, and peer lists are sorted before
 // templating so that the only way the rendered config changes is when
 // semantically meaningful inputs change. This stability matters because
 // BirdManager compares Render output against the last-written config to
 // avoid superfluous birdc reloads.
 //
 // Render validates every operator-supplied value that flows into the
 // templated output (peer addresses, CIDRs, anycast IPs, source addresses)
 // so a malformed NodeConfig or annotation cannot produce a malformed
 // bird.conf — even one that BIRD would later reject.
 func Render(in NodeBGP) (string, error) {
 	if in.RouterID == "" {
-		return "", fmt.Errorf("RouterID is required")
+		return "", fmt.Errorf("bird render: RouterID is required")
 	}
 	if net.ParseIP(in.RouterID) == nil {
 		return "", fmt.Errorf("bird render: RouterID %q is not a valid IP", in.RouterID)
 	}
 	if in.LocalASN == 0 {
-		return "", fmt.Errorf("LocalASN is required")
+		return "", fmt.Errorf("bird render: LocalASN is required")
 	}
-	// Stable order — important so config changes only when something real
+	if err := validateLocalSource(in.LocalV6, "v6"); err != nil {
-	// changes (avoids needless birdc reloads).
+		return "", err
 	}
 	if err := validateLocalSource(in.LocalV4, "v4"); err != nil {
 		return "", err
 	}
 	if err := validateLocalSubnet(in.LocalSubnetV6, "v6"); err != nil {
 		return "", err
 	}
 	if err := validateLocalSubnet(in.LocalSubnetV4, "v4"); err != nil {
 		return "", err
 	}
 	for i, p := range in.Peers {
 		if err := validatePeer(p); err != nil {
 			return "", fmt.Errorf("bird render: peer[%d]: %w", i, err)
 		}
 	}
 	if err := validateCIDRs(in.CIDR6, "v6"); err != nil {
 		return "", fmt.Errorf("bird render: cidr6: %w", err)
 	}
 	if err := validateCIDRs(in.CIDR4, "v4"); err != nil {
 		return "", fmt.Errorf("bird render: cidr4: %w", err)
 	}
 	if err := validateAnycastIPs(in.Anycast6, "v6"); err != nil {
 		return "", fmt.Errorf("bird render: anycast6: %w", err)
 	}
 	if err := validateAnycastIPs(in.Anycast4, "v4"); err != nil {
 		return "", fmt.Errorf("bird render: anycast4: %w", err)
 	}
 	in = normalize(in)
 	t, err := template.New("bird").Parse(tpl)
 	if err != nil {
-		return "", err
+		return "", fmt.Errorf("bird template parse: %w", err)
 	}
 	var buf bytes.Buffer
 	if err := t.Execute(&buf, in); err != nil {
-		return "", err
+		return "", fmt.Errorf("bird template execute: %w", err)
 	}
 	return buf.String(), nil
 }
 // validatePeer checks that a peer entry has a parseable IP whose family
 // matches its declared Family field, and a non-zero ASN.
 func validatePeer(p Peer) error {
 	if p.ASN == 0 {
 		return fmt.Errorf("ASN must be non-zero")
 	}
 	ip := net.ParseIP(p.Address)
 	if ip == nil {
 		return fmt.Errorf("address %q is not a valid IP", p.Address)
 	}
 	isV4 := ip.To4() != nil
 	switch p.Family {
 	case "v6":
 		if isV4 {
 			return fmt.Errorf("address %q is IPv4 but Family is v6", p.Address)
 		}
 	case "v4":
 		if !isV4 {
 			return fmt.Errorf("address %q is IPv6 but Family is v4", p.Address)
 		}
 	default:
 		return fmt.Errorf("Family %q must be v6 or v4", p.Family)
 	}
 	return nil
 }
 // validateCIDRs parses each entry as a CIDR and rejects family mismatches.
 // fam must be "v6" or "v4".
 func validateCIDRs(cidrs []string, fam string) error {
 	for _, c := range cidrs {
 		_, n, err := net.ParseCIDR(c)
 		if err != nil {
 			return fmt.Errorf("invalid CIDR %q: %w", c, err)
 		}
 		isV4 := n.IP.To4() != nil
 		if fam == "v6" && isV4 {
 			return fmt.Errorf("CIDR %q is IPv4, expected IPv6", c)
 		}
 		if fam == "v4" && !isV4 {
 			return fmt.Errorf("CIDR %q is IPv6, expected IPv4", c)
 		}
 	}
 	return nil
 }
 // validateAnycastIPs parses each entry as a literal IP (no prefix) and rejects
 // family mismatches.
 func validateAnycastIPs(ips []string, fam string) error {
 	for _, s := range ips {
 		ip := net.ParseIP(s)
 		if ip == nil {
 			return fmt.Errorf("invalid IP %q", s)
 		}
 		isV4 := ip.To4() != nil
 		if fam == "v6" && isV4 {
 			return fmt.Errorf("IP %q is IPv4, expected IPv6", s)
 		}
 		if fam == "v4" && !isV4 {
 			return fmt.Errorf("IP %q is IPv6, expected IPv4", s)
 		}
 	}
 	return nil
 }
 // validateLocalSource validates an optional LocalV6/LocalV4 source address.
 // Empty is allowed (BIRD picks its own); non-empty must be a parseable IP of
 // the matching family.
 func validateLocalSource(s, fam string) error {
 	if s == "" {
 		return nil
 	}
 	ip := net.ParseIP(s)
 	if ip == nil {
 		return fmt.Errorf("bird render: Local%s %q is not a valid IP", strings.ToUpper(fam), s)
 	}
 	isV4 := ip.To4() != nil
 	if fam == "v6" && isV4 {
 		return fmt.Errorf("bird render: LocalV6 %q is IPv4", s)
 	}
 	if fam == "v4" && !isV4 {
 		return fmt.Errorf("bird render: LocalV4 %q is IPv6", s)
 	}
 	return nil
 }
 // validateLocalSubnet validates an optional LocalSubnetV6/LocalSubnetV4 CIDR.
 // Empty is allowed (no import filter); non-empty must be a parseable CIDR of
 // the matching family in canonical form (host bits zero) so the BIRD `net !=`
 // comparison matches the route the gateway re-advertises.
 func validateLocalSubnet(s, fam string) error {
 	if s == "" {
 		return nil
 	}
 	ip, n, err := net.ParseCIDR(s)
 	if err != nil {
 		return fmt.Errorf("bird render: LocalSubnet%s %q is not a valid CIDR: %w", strings.ToUpper(fam), s, err)
 	}
 	if !ip.Equal(n.IP) {
 		return fmt.Errorf("bird render: LocalSubnet%s %q has non-zero host bits (want %s)", strings.ToUpper(fam), s, n.String())
 	}
 	isV4 := n.IP.To4() != nil
 	if fam == "v6" && isV4 {
 		return fmt.Errorf("bird render: LocalSubnetV6 %q is IPv4", s)
 	}
 	if fam == "v4" && !isV4 {
 		return fmt.Errorf("bird render: LocalSubnetV4 %q is IPv6", s)
 	}
 	return nil
 }
 func normalize(in NodeBGP) NodeBGP {
 	cp := in
 	cp.CIDR6 = sortedUnique(in.CIDR6)
@@ -0,0 +1,101 @@
 package bird
 import (
 	"strings"
 	"testing"
 )
 // FuzzRender drives the bird template with a wide range of inputs and
 // confirms two safety properties:
 //
 //  1. Render never panics.
 //  2. On nil-error return, the output is deterministic (calling Render
 //     twice with the same input yields byte-identical output) and contains
 //     no unbalanced braces (a smoke test for malformed template branches).
 func FuzzRender(f *testing.F) {
 	type seed struct {
 		routerID string
 		asn      uint32
 		peerAddr string
 		peerASN  uint32
 		cidr6    string
 		cidr4    string
 		anycast6 string
 		anycast4 string
 		localV6  string
 		localV4  string
 		subnet6  string
 		subnet4  string
 	}
 	seeds := []seed{
 		{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1", peerASN: 65000, cidr6: "2001:db8:f001::/64"},
 		{routerID: "172.25.25.101", asn: 65101, peerAddr: "172.25.25.1", peerASN: 65000, cidr4: "172.25.210.0/24"},
 		{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1", peerASN: 65000, cidr6: "2001:db8:f001::/64", anycast6: "2001:db8:a::1"},
 		{routerID: "10.0.0.1", asn: 65101, peerAddr: "10.0.0.2", peerASN: 65000, cidr4: "10.0.0.0/24", anycast4: "10.255.0.1"},
 		{routerID: "10.0.0.1", asn: 65101},                          // no peer, no cidrs
 		{routerID: "", asn: 65101, peerAddr: "10.0.0.2", peerASN: 1}, // empty routerID → expect error
 		{routerID: "10.0.0.1", asn: 0, peerAddr: "10.0.0.2", peerASN: 1}, // zero ASN → expect error
 		// Backtick-bearing inputs to defend the template against accidental
 		// closure of the raw-string literal.
 		{routerID: "10.0.0.1`", asn: 65101},
 		// Newlines and template-meta in user-supplied addresses
 		{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1\n{{kaboom}}", peerASN: 65000, cidr6: "2001:db8:f001::/64"},
 		// LocalSubnet filters set.
 		{routerID: "172.25.25.104", asn: 65104, peerAddr: "2602:817:3000:a25::1", peerASN: 65000, subnet6: "2602:817:3000:a25::/64", subnet4: "172.25.25.0/24"},
 		// Malformed subnet should be rejected by validation, not crash.
 		{routerID: "10.0.0.1", asn: 65101, subnet6: "not-a-cidr"},
 	}
 	for _, s := range seeds {
 		f.Add(s.routerID, s.asn, s.peerAddr, s.peerASN, s.cidr6, s.cidr4, s.anycast6, s.anycast4, s.localV6, s.localV4, s.subnet6, s.subnet4)
 	}
 	f.Fuzz(func(t *testing.T, routerID string, asn uint32, peerAddr string, peerASN uint32, cidr6, cidr4, anycast6, anycast4, localV6, localV4, subnet6, subnet4 string) {
 		in := NodeBGP{
 			RouterID:      routerID,
 			LocalASN:      asn,
 			LocalV6:       localV6,
 			LocalV4:       localV4,
 			LocalSubnetV6: subnet6,
 			LocalSubnetV4: subnet4,
 		}
 		// Add the peer in whichever family it belongs to, if any. FamilyOf
 		// returns "" for non-IPs; that test exercises the "skip unknown
 		// family" branch in the bird agent code path.
 		if fam := FamilyOf(peerAddr); fam != "" {
 			in.Peers = []Peer{{Family: fam, Address: peerAddr, ASN: peerASN}}
 		}
 		if cidr6 != "" {
 			in.CIDR6 = []string{cidr6}
 		}
 		if cidr4 != "" {
 			in.CIDR4 = []string{cidr4}
 		}
 		if anycast6 != "" {
 			in.Anycast6 = []string{anycast6}
 		}
 		if anycast4 != "" {
 			in.Anycast4 = []string{anycast4}
 		}
 		out, err := Render(in)
 		if err != nil {
 			return
 		}
 		// Determinism.
 		out2, err := Render(in)
 		if err != nil {
 			t.Fatalf("Render became flaky: first ok, second %v", err)
 		}
 		if out != out2 {
 			t.Fatalf("Render not deterministic on identical input")
 		}
 		// Smoke test for balanced braces. The template uses `{` and `}`
 		// as BIRD's block delimiters; if our template engine ever
 		// produced an unbalanced output we'd catch it here.
 		if got := strings.Count(out, "{") - strings.Count(out, "}"); got != 0 {
 			t.Fatalf("unbalanced braces: %d", got)
 		}
 	})
 }
@@ -75,6 +75,89 @@ func TestRender_StableOutput(t *testing.T) {
 	}
 }
 func TestRender_LocalSubnetImportFilter(t *testing.T) {
 	out, err := Render(NodeBGP{
 		RouterID:      "172.25.25.104",
 		LocalASN:      65104,
 		Peers:         []Peer{{Family: "v6", Address: "2602:817:3000:a25::1", ASN: 65000}, {Family: "v4", Address: "172.25.25.1", ASN: 65000}},
 		CIDR6:         []string{"2602:817:3000:f004::/64"},
 		CIDR4:         []string{"172.25.214.0/24"},
 		LocalSubnetV6: "2602:817:3000:a25::/64",
 		LocalSubnetV4: "172.25.25.0/24",
 	})
 	if err != nil {
 		t.Fatal(err)
 	}
 	for _, want := range []string{
 		"import where net != 2602:817:3000:a25::/64;",
 		"import where net != 172.25.25.0/24;",
 	} {
 		if !strings.Contains(out, want) {
 			t.Errorf("missing %q in output:\n%s", want, out)
 		}
 	}
 	// Each BGP peer block should use the import filter, not import all.
 	// Slice out just the `protocol bgp ...` stanzas to avoid catching the
 	// kernel proto's legitimate `import all;`.
 	for _, marker := range []string{"protocol bgp upstream6_", "protocol bgp upstream4_"} {
 		idx := strings.Index(out, marker)
 		if idx < 0 {
 			continue
 		}
 		end := strings.Index(out[idx:], "\n}")
 		if end < 0 {
 			continue
 		}
 		stanza := out[idx : idx+end]
 		if strings.Contains(stanza, "import all;") {
 			t.Errorf("BGP stanza still has `import all;`:\n%s", stanza)
 		}
 	}
 }
 func TestRender_LocalSubnetEmpty_FallsBackToImportAll(t *testing.T) {
 	out, err := Render(NodeBGP{
 		RouterID: "10.0.0.1",
 		LocalASN: 65101,
 		Peers:    []Peer{{Family: "v6", Address: "2001:db8::1", ASN: 65000}},
 		CIDR6:    []string{"2001:db8:f001::/64"},
 	})
 	if err != nil {
 		t.Fatal(err)
 	}
 	if !strings.Contains(out, "import all;") {
 		t.Errorf("expected `import all;` when LocalSubnetV6 unset:\n%s", out)
 	}
 }
 func TestRender_LocalSubnetValidation(t *testing.T) {
 	cases := []struct {
 		name    string
 		v6, v4  string
 		wantErr string
 	}{
 		{name: "non-canonical v6", v6: "2602:817:3000:a25::1/64", wantErr: "non-zero host bits"},
 		{name: "non-canonical v4", v4: "172.25.25.1/24", wantErr: "non-zero host bits"},
 		{name: "v6 family mismatch", v6: "172.25.25.0/24", wantErr: "is IPv4"},
 		{name: "v4 family mismatch", v4: "2602:817:3000:a25::/64", wantErr: "is IPv6"},
 		{name: "garbage", v6: "not-a-cidr", wantErr: "not a valid CIDR"},
 	}
 	for _, tc := range cases {
 		t.Run(tc.name, func(t *testing.T) {
 			_, err := Render(NodeBGP{
 				RouterID:      "10.0.0.1",
 				LocalASN:      65101,
 				Peers:         []Peer{{Family: "v6", Address: "2001:db8::1", ASN: 65000}},
 				LocalSubnetV6: tc.v6,
 				LocalSubnetV4: tc.v4,
 			})
 			if err == nil || !strings.Contains(err.Error(), tc.wantErr) {
 				t.Fatalf("want error containing %q, got %v", tc.wantErr, err)
 			}
 		})
 	}
 }
 func TestFamilyOf(t *testing.T) {
 	if FamilyOf("2001:db8::1") != "v6" {
 		t.Fatal("v6 detection broken")
@@ -0,0 +1,13 @@
 go test fuzz v1
 string("0")
 uint32(65101)
 string("0")
 uint32(1)
 string("")
 string("")
 string("")
 string("}")
 string("")
 string("")
 string("")
 string("")
Author	SHA1	Message	Date
Donavan Fritz	580b9afa33	ci: push image to fritzlab-public org flock / release (push) Successful in 47m37s Details This repo was transferred from fritzlab to fritzlab-public so the container package's anonymous-pull access (governed by org visibility in Gitea 1.26.1) remains open after the rest of fritzlab/* flips to limited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 13:58:56 -05:00
Donavan Fritz	8d6e50c980	deploy: catch-all toleration so DS schedules on not-ready nodes flock / release (push) Successful in 45m40s Details Replaces the explicit toleration list with `operator: Exists`. The previous list lacked node.kubernetes.io/not-ready:NoSchedule, so during a fresh control-plane join the CNI agent couldn't schedule until the node became Ready — but the node can't become Ready without the CNI. Surfaced during host001/host002 PERC migration rebuild. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 09:35:27 -05:00
Donavan Fritz	3d0081780c	ci: migrate to action/ org composite actions flock / release (push) Successful in 3m4s Details	2026-05-06 08:14:35 -05:00
Donavan Fritz	9b777ca7d1	bird: per-peer import filter rejects connected subnet Build flock Image / build (push) Successful in 2m17s Details Without a filter, crt001's `network 2602:817:3000:A25::/64` gets re-advertised to every peer on that subnet. bird installs the BGP /64 with metric 32, beating the kernel-connected route at 256, and all inter-host VLAN-25 traffic hairpins through the gateway — losing PMTU 9000 and ~30x throughput. Broke Plex 2026-05-04: NFS to nas002 capped at 7 MB/s, jumbo blackholed. Add LocalSubnetV6/V4 (CIDR) to NodeBGP. Agent populates by masking the peer's address to /64 (v6) or /24 (v4) — same fritzlab convention already in localAddrSameSubnet. Render emits `import where net != <subnet>;` per BGP channel when set, falls back to `import all;` otherwise so existing tests stay green. Defence in depth: with the matching outbound route-map on crt001 (ROUTE_MAP_CLUSTER_OUT_V{4,6}) the agent now refuses the leak on its own if the router filter ever drifts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 21:03:59 -05:00
Donavan Fritz	a17d33e182	agent: addresses annotation replaces IPAM allocation Build flock Image / build (push) Successful in 5m27s Details When flock.fritzlab.net/addresses provides a v6 or v4, the IP becomes the pod's primary IP for that family — bound to eth0, default route off it, on-link host route via setHostRoute, and a per-pod /128 or /32 in BGP. IPAM no longer allocates a private IP alongside it. The pod ends up with exactly the operator-supplied addresses on eth0 (plus any extras beyond the first-of-family, which keep the pre-existing layered behavior). This is the fix the original addresses-annotation work missed: bug #1 allocated a private IP next to the public one (so VPN-routed clients could land on the private path on Plex). Promoting addresses-supplied IPs into the IPAM-style routing slot keeps the public IP as the only primary IP visible from outside. Three pieces: - annotations.go: reject pods whose addresses/anycast IP family is disabled (ipv6/ipv4 annotation or NodeConfig default). Both annotation types rely on the family being enabled for return-path routing. - handlers.go: peel first v6 + first v4 from Addresses into res.IP6/IP4; suppress IPAM for those families; skip IPAM call entirely if both families are addresses-supplied. - anycast_linux.go: extend renderBird to advertise any IPAM IP that's outside the node's BGP aggregate as a per-pod /32 or /128. This is what makes 142.202.202.166 reachable when host004's pod CIDR is 172.25.214.0/24 — the addresses-promoted IP isn't covered by the aggregate. Tests: 7 new annotation tests covering the conflict cases (ipv4=false + addresses-v4, NodeConfig default + addresses-v4, etc.) plus 5 unit tests for the splitAddressesPrimary helper. README updated with the addresses-replaces-IPAM behavior, the addresses-vs-anycast comparison, the conflict rule, and a Plex-style example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 09:46:48 -05:00
Donavan Fritz	40e13037b5	agent: revert CNI result addresses inclusion; document k8s limit Build flock Image / build (push) Successful in 1m36s Details Kubernetes limits pod.status.podIPs to one IPv4 + one IPv6 per pod. Additional IPs in the CNI result are silently dropped by kubelet, making the resultFromAllocation change in `4a60c00` a no-op. Revert it and add a comment documenting the constraint so the intent is clear. Addresses IPs remain fully functional: bound to eth0, advertised via BGP, visible inside the pod — just not reflected in pod status. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 18:37:05 -05:00
Donavan Fritz	4a60c004c3	agent: include addresses IPs in CNI result Build flock Image / build (push) Successful in 1m37s Details resultFromAllocation now appends Addresses entries to the CNI result so they appear in pod.status.podIPs. Kubernetes and workloads that inspect pod metadata (e.g. Plex remote-access detection) see the public IPs alongside the IPAM-allocated ones. Anycast IPs are intentionally excluded — they're shared across replicas and must not appear as per-pod IPs in Kubernetes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 18:11:17 -05:00
Donavan Fritz	2daa2a21f3	agent: add flock.fritzlab.net/addresses annotation (eth0 static IPs) Build flock Image / build (push) Successful in 3m23s Details Like anycast, addresses IPs are advertised via BGP (/128+/32) and get host routes via the AnycastReconciler. The sole difference: they are assigned to pod eth0 instead of lo, so workloads that inspect their primary interface (e.g. Plex remote-access detection) see the public IP directly. - annotations.go: annAddresses const, Addresses []net.IP in ParsedAnnotations - state.go: Addresses []string persisted in allocations.json - anycast.go: resolveAnycastTargets processes Anycast+Addresses together - netns_linux.go: configurePodSide assigns Addresses to eth0 - netns_stub.go: mirror Addresses field for non-Linux builds - handlers.go: thread Addresses through ADD path Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 17:50:49 -05:00
Donavan Fritz	362a1e01ce	ci: trigger dispatch after scheduler reset Build flock Image / build (push) Successful in 1m56s Details	2026-04-26 17:53:55 -05:00
Donavan Fritz	222006240c	ci: use fritzlab/build-image@v1 Build flock Image / build (push) Has been cancelled Details Replaces inline docker login + metadata + build-push + tag-cleanup with the shared build-image composite action. Standardizes on CI_BOT_TOKEN (drops REGISTRY_PASSWORD).	2026-04-26 09:32:46 -05:00
Donavan Fritz	e00579f7ca	nodecondition: SSA the NetworkUnavailable condition (don't merge-patch) Build flock Image / build (push) Has been cancelled Details The previous implementation used JSON merge-patch (types.MergePatchType) with a one-element conditions array. JSON merge-patch on arrays is whole-array replacement, so every 60s flock-agent stomped over the kubelet-managed conditions (Ready, MemoryPressure, DiskPressure, PIDPressure), leaving only NetworkUnavailable on the node — until kubelet's next status post (~5s later) re-set them. Symptom: `kubectl get nodes` flickered, with one node briefly showing Unknown each polling tick. k9s lit up red on rotating nodes. (kube- controller-manager is also a write contender and was correctly noted in the field-managers list.) Switch to Server-Side Apply against the status subresource with fieldManager=flock-agent and Force=true. NodeStatus.Conditions is a listType=map keyed by `type`, so SSA merges by type — we declare ownership of only the NetworkUnavailable entry and leave kubelet's entries untouched. Force lets us reclaim the condition if a previous CNI manager (e.g. calico-node finalizer leftovers) still owns it. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-26 08:55:03 -05:00
Donavan Fritz	a6a50fd73f	ci: retrigger build (run #685 + #686 hit transient github.com timeout / cancellation) Build flock Image / build (push) Has been cancelled Details Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 22:56:33 -05:00
Donavan Fritz	c61b12204c	anycast: drop pods from nexthop set on DeletionTimestamp Build flock Image / build (push) Has been cancelled Details Previously the AnycastReconciler kept a pod in the nexthop set as long as its PodReady condition was True. During a rolling restart that produces a window after kubelet has accepted SIGTERM (DeletionTimestamp set, pod still Ready until probes observe shutdown) where BGP still advertises a path through the dying pod's veth — in-flight requests get RST'd when the container actually exits. Fix: introduce podAnycastEligible(pod) = !DeletionTimestamp && Ready, swap it in at the AnycastReconciler's isReady callback, and fire the ready-change callback when DeletionTimestamp transitions (the informer UpdateFunc previously only fired on Ready transitions). Result: as soon as the apiserver marks a pod for deletion, the reconciler withdraws the local nexthop and BIRD reannounces the route without it. Sibling replicas absorb traffic before the pod's terminationGracePeriod elapses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 22:24:50 -05:00
Donavan Fritz	e9d3eef2cc	netpol: accept established+related at top of every pod chain Build flock Image / build (push) Has been cancelled Details K8s NetworkPolicy applies to the start of new connections; reply packets for established flows (and ICMP related) must not be matched against the explicit allow set. The pod ingress chain previously had only explicit dport allows + a final drop, so any reply to a pod-initiated outbound where the reply's dport (the ephemeral source port) wasn't in the allow set got dropped. Hit in production 2026-04-26: garage's `garage-admin-restrict` NP allowed dports 3900/80/3901/3903 only. Garage uses kubernetes_discovery to find peers — outbound to kube-apiserver succeeded, replies returned to ephemeral source ports, dropped → "Layout not ready" cluster-wide. Fix: emit `ct state established,related accept` as the first rule in every pod_<hash>_(ingress\|egress) chain. Regression test added. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 22:22:39 -05:00
Donavan Fritz	8dd109866e	ci: re-trigger build (runs #682-#683 failed transient github.com timeout) Build flock Image / build (push) Has been cancelled Details Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 11:53:35 -05:00
Donavan Fritz	d5161e09d3	deploy: drop fritzlab.net/cni-test toleration Build flock Image / build (push) Has been cancelled Details Migration off Calico is complete; host001/host004 no longer carry the cni-test taint. The toleration is now dead config.	2026-04-25 11:42:48 -05:00
Donavan Fritz	65b2fb5b17	ip-algo: rename `pod` field to `app`; image from pod spec Build flock Image / build (push) Has been cancelled Details The `pod` field hashed pod.Name, which differs per replica because of the ReplicaSet pod-template-hash + 5-char random suffix. With namespace,pod,image, all replicas of the same Deployment got distinct hextets even though they were the same workload. Replace `pod` with `app` — a stable workload identifier derived from the controller chain: - Deployment → ReplicaSet → Pod: strip the pod-template-hash suffix from the RS name (`traefik-789df685f` → `traefik`). - StatefulSet/DaemonSet/Job → Pod: use controller name as-is. - Bare pod: pod name. Image now comes from pod.Spec.Containers[0].Image (the spec'd reference). 64-hex-char values are treated as sha256 digests and parsed as before; everything else (image:tag, short SHA) is FNV-1a-64'd as a string. This makes `traefik:v3.5` deterministic across replicas without needing the runtime-resolved digest. Net effect: namespace,app,image yields identical hextets across all replicas of the same Deployment except the trailing random N nibble. embed.Values.Pod → App; AllocRequest.Pod kept for log context only, new App and Image fields drive the embed call. handlers.go computes both via deriveAppName + podImageRef helpers. Tests: 7 new TestDeriveAppName_* cases (Deploy/STS/DS/bare/RS-without- hash/non-controller-owner) + TestPodImageRef. Existing fuzz seeds updated for the new keyword. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 11:42:06 -05:00
Donavan Fritz	c860e9351b	ip-algo: pod annotation > NodeConfig annotation > random Build flock Image / build (push) Has been cancelled Details Add flock.fritzlab.net/ip-algo as a node-wide default via NodeConfig metadata.annotations. Pod-level annotation still wins. Empty, missing, or invalid input at either level falls through to the next; invalid values warn-log via the agent's slog. Both unset → fully random IID (unchanged baseline). ParseAnnotations no longer touches ip-algo; ResolveIPAlgo handles the full precedence chain, called from PodHandler.Add with the cached NodeConfig's annotations and the agent logger. Tests: 9 new TestResolveIPAlgo_* cases covering pod-wins, all fall-through paths, both-absent, nil node map, whitespace, and duplicate-as-invalid. Fuzz target rebuilt without ip-algo input space (now exercised by ResolveIPAlgo unit tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 11:09:09 -05:00
Donavan Fritz	a6202a36bd	defaults: built-in baseline is dual-stack (IPv6 + IPv4), not IPv6-only Build flock Image / build (push) Has been cancelled Details BuiltinFamilyDefaults() now returns {WantV6: true, WantV4: true}. Pods that want a single family explicitly opt out via the flock.fritzlab.net/ipv4 (or ipv6) annotation, or the operator narrows the default at the node level via NodeConfig.Spec.Defaults. Annotation precedence is unchanged: pod annotation > NodeConfig defaults > built-in baseline. Tests updated to reflect the new baseline; the "opt out of v4" path now has explicit coverage. Docs updated: - NodeConfig.Spec.Defaults Go doc + CRD descriptions reflect the new baseline and its overrides - README opening framing softened from "IPv6-first" to "dual-stack, IPv6-friendly"; example pods + spec.defaults table flipped to treat dual-stack as the default and v6/v4-only as overrides - README NetworkPolicy line in the comparison table flipped to "yes (nftables)" since v1 enforcement shipped - Limitations note about IPv4-only destinations rewritten — every pod has v4 by default now, so the question is whether your IPv4 pool is routable beyond your network Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 10:07:48 -05:00
Donavan Fritz	a7dc7bf1f4	anycast: kernel multipath route + L4 hash for multi-pod-per-node Build flock Image / build (push) Has been cancelled Details Move pure resolver logic out of anycast_linux.go into anycast.go so it's unit-testable on any host. Reshape anycastTarget from a single {hostIface, via} into a sorted list of nexthops; multiple Ready pods on the same node binding the same anycast IP now contribute one nexthop each. installAnycastRoute uses RTA_MULTIPATH (via netlink.Route.MultiPath) when the target has more than one nexthop. Single-nexthop targets keep the simple via-route shape so 1-pod-per-node keeps rendering identically to today's production form in `ip route show`. flock-agent writes net.ipv{4,6}.fib_multipath_hash_policy = 1 at startup so the kernel hashes flows on (saddr, daddr, sport, dport, proto) rather than just IPs. Best-effort — runs privileged in production, so it works; falls back to L3 hash on environments where the write fails (only matters for the multi-pod-per-node case anyway). resolveAnycastTargets sorts nexthops by canonical(via) for stable comparison so a quiet reconcile pass doesn't churn the kernel route. 8 new unit tests cover: 1-pod, 2-pods-same-anycast (multi-nexthop), NotReady drop, no-Ready omits the IP, pending skipped, mixed v6+v4, family mismatch warns, determinism. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:57:32 -05:00
Donavan Fritz	5d9b6bfeec	netpol: anchor base-chain jump on veth only, not pod IP Build flock Image / build (push) Has been cancelled Details The previous base-chain jump matched iifname/oifname AND saddr/daddr == pod eth0 IP. Anycast traffic has the anycast IP as daddr, not the pod's eth0 unicast — so anycast packets skipped the policy chain entirely and fell through to the forward chain's policy=accept. The veth uniquely belongs to one pod. Anything traversing it is to or from that pod by definition (anycast, unicast, future overlay routes). Match on iifname/oifname alone; let the pod-side chain's accept lines + trailing drop be the policy. Validated end-to-end on host001: anycast nginx pod with default-deny ingress NetPol now correctly drops traffic from any peer; adding an allow-from-podSelector rule unblocks only the matched peer. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:32:08 -05:00
Donavan Fritz	39ede9130b	netpol: NetworkPolicy v1 enforcement via nftables Build flock Image / build (push) Has been cancelled Details New pkg/agent/netpol implementing standard networking.k8s.io/v1 NetworkPolicy. Pipeline: pods + policies + namespaces → Translate → Render → Apply Supports ingress + egress, all three peer types (podSelector, namespaceSelector, ipBlock with except), numeric ports + port ranges, default-deny semantics derived from PolicyTypes (or inferred from non-empty Spec.Egress when unset). Apply path is `nft -f -` shell-out — single transaction, atomic, kernel guarantees partial-failure rollback. Idempotent dedup via last-applied script. Reconcile triggers: informer events, 30s self-heal tick, every CNI ADD/DEL. Verified against the three live cluster NetPols (calico-apiserver, remote-proxies/lodge-home-assistant, storage/garage-admin-restrict). Fuzz target stitches Translate + Render with random selector and peer inputs; 21 unit tests cover the policy semantics. Named ports skip with a warn — deferred until kubelet exposes them in a form that doesn't require shadowing pod state. Dockerfile: + nftables. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:25:58 -05:00
Donavan Fritz	71e584cf96	NodeConfig defaults + code-quality pass + fuzz tests + README NodeConfig.Spec.Defaults adds per-node IPv6/IPv4 family defaults that pod annotations can override; built-in baseline (v6=true, v4=false) still applies when the field is omitted. bird.Render now validates every operator-supplied value (peer addresses, CIDRs, anycast IPs, source addresses) before templating — fuzz found a peer address containing `}` produced unbalanced braces in bird.conf. Failing input preserved as a regression seed. Fuzz targets added for ParseAnnotations, ParseCNIArgs, HostIfaceName, canonical, IPAM allocate sequences, embed.Embed, and bird.Render. Hardened canonical/ipToU32 against nil and non-IPv4 inputs. README rewritten for outside readers — quickstart, NodeConfig + annotation reference with worked examples, anycast use cases, comparison vs Calico and Cilium, requirements, limitations. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 09:25:45 -05:00