T

Build flock Image / build (push) Successful in 2m17s

Details

bird: per-peer import filter rejects connected subnet

Without a filter, crt001's `network 2602:817:3000:A25::/64` gets
re-advertised to every peer on that subnet. bird installs the BGP /64
with metric 32, beating the kernel-connected route at 256, and all
inter-host VLAN-25 traffic hairpins through the gateway — losing PMTU
9000 and ~30x throughput. Broke Plex 2026-05-04: NFS to nas002 capped
at 7 MB/s, jumbo blackholed.

Add LocalSubnetV6/V4 (CIDR) to NodeBGP. Agent populates by masking the
peer's address to /64 (v6) or /24 (v4) — same fritzlab convention
already in localAddrSameSubnet. Render emits `import where net !=
<subnet>;` per BGP channel when set, falls back to `import all;`
otherwise so existing tests stay green.

Defence in depth: with the matching outbound route-map on crt001
(ROUTE_MAP_CLUSTER_OUT_V{4,6}) the agent now refuses the leak on its
own if the router filter ever drifts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-04 21:03:59 -05:00

.gitea/workflows

ci: trigger dispatch after scheduler reset

2026-04-26 17:53:55 -05:00

cmd

M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer

2026-04-24 22:33:48 -05:00

deploy

deploy: drop fritzlab.net/cni-test toleration

2026-04-25 11:42:48 -05:00

pkg

bird: per-peer import filter rejects connected subnet

2026-05-04 21:03:59 -05:00

.dockerignore

flock M1 scaffold: CNI plugin + agent + NodeConfig CRD

2026-04-24 21:17:42 -05:00

Dockerfile

netpol: NetworkPolicy v1 enforcement via nftables

2026-04-25 09:25:58 -05:00

go.mod

M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer

2026-04-24 22:33:48 -05:00

go.sum

M2: netlink, IPAM/handler wiring, BIRD sidecar, CNI installer

2026-04-24 22:33:48 -05:00

LICENSE

flock M1 scaffold: CNI plugin + agent + NodeConfig CRD

2026-04-24 21:17:42 -05:00

README.md

agent: addresses annotation replaces IPAM allocation

2026-04-29 09:46:48 -05:00

README.md

flock

A small, opinionated Kubernetes CNI built around three ideas:

Dual-stack, IPv6-friendly. Every pod gets a globally routable IPv6 address by default. IPv4 is also enabled by default; either family can be turned off per-node or per-pod when you really mean to.
No tunnels, no NAT. Pod addresses are the real packets on the wire. Each node speaks BGP to its upstream router and advertises its own per-node prefix. The pod network is just the LAN, plus host routes.
Anycast as a primitive. A pod can request an anycast address via an annotation; flock binds it on the pod's loopback and advertises a /128 (or /32) over BGP, but only while the pod is Ready. Multiple replicas advertise the same address from different nodes for ECMP load balancing without a separate Service or external LB.

flock is built for clusters where every node already speaks BGP to one or more upstream routers. It deliberately leaves out features you'd expect from a general-purpose CNI — overlays, IPsec/Wireguard, IPAM coordination across nodes, kube-proxy integration — so the moving parts that remain are easy to reason about.

Status: alpha. CRD shape and annotation keys may still change.

How it works
Requirements
Quickstart
NodeConfig CRD
Pod annotations
Use cases
Comparison vs Calico / Cilium
Limitations and non-goals
Building and testing
License

How it works

Each node runs a single flock-agent DaemonSet pod with three containers:

a privileged init container (flock-installer) that drops the CNI plugin binary into /opt/cni/bin/flock and writes /etc/cni/net.d/01-flock.conflist,
the agent itself, which owns IPAM, programs veth pairs, and tracks pod readiness, and
a BIRD2 sidecar that the agent re-renders and reloads when the per-node config or the active anycast set changes.

Each node has a NodeConfig CR (cluster-scoped, name = node name) that declares its IPv6 and IPv4 prefixes, its local BGP ASN, and its upstream peers. The agent reads the CR via a dynamic informer.

When kubelet runs the CNI plugin on ADD, the plugin opens a unix-socket RPC to the agent. The agent allocates an address from the per-node CIDRs, creates a veth pair, configures the pod side, persists the allocation to /var/lib/flock/allocations.json, and returns the result. There is no controller loop and no IPAM coordination across nodes — each node owns a non-overlapping CIDR and allocates locally.

For anycast, the agent installs <anycast-ip> via <pod-eth0-ip> dev <veth> host routes on the node and adds the anycast IP to BIRD's BGP export filter. When a pod loses readiness, the agent withdraws the route from both the kernel and BGP within one reconcile cycle (sub-second).

Packet path

pod.eth0 (a veth) ↔ host-side veth (with addrgenmode none, fe80::1/64, proxy-ARP for the v4 default-via) ↔ host kernel ↔ uplink NIC ↔ upstream router. No conntrack, no SNAT, no encapsulation.

For IPv6 the host side of every veth carries the deterministic link-local gateway fe80::1, so every pod can use a fixed default route. For IPv4 the host side answers ARP for 169.254.1.1, providing the same fixed default route in v4.

Requirements

Linux nodes. flock has not been tested on, and does not target, Windows nodes.
Kubernetes ≥ 1.27.
An upstream router (or pair) that accepts a BGP session from each node. flock has been tested with Cisco IOS-XE, Arista EOS, and FRR acting as the upstream; anything that speaks standard eBGP should work.
Globally routable (or at least datacentre-routable) IPv6 prefix delegated to the cluster, sliced into a per-node /64. IPv4 is optional but supported.
Each node must have a unique local ASN. Private ASNs (64512–65534, 4200000000–4294967294) are typical.

Quickstart

# 1. Install CRD + RBAC + DaemonSet (single bundled manifest):
kubectl apply -f deploy/install.yaml

# 2. Label the node(s) you want flock to manage:
kubectl label node <node-name> flock.fritzlab.net/agent=

# 3. Apply a NodeConfig CR for that node (see "NodeConfig CRD" below):
kubectl apply -f my-nodeconfig.yaml

# 4. Verify the agent is up:
kubectl -n kube-system get pod -l app=flock-agent -o wide
kubectl -n kube-system exec -it ds/flock-agent -c bird -- \
    birdc -s /run/flock/bird.ctl show protocols

The DaemonSet is gated by the flock.fritzlab.net/agent node label, so unlabelled nodes continue to use whatever CNI was installed before. This lets you migrate node-by-node — start with one node, prove it works, then proceed.

NodeConfig CRD

A NodeConfig is the only operator-supplied input. One per node, name matches the node name. Example:

apiVersion: flock.fritzlab.net/v1alpha1
kind: NodeConfig
metadata:
  name: node-a
spec:
  cidr6:
    - 2001:db8:f001::/64       # Pods on this node get addresses from here.
  cidr4:
    - 192.0.2.0/24             # IPv4 pool, used only when a pod opts in.
  defaults:
    ipv6: true                 # Optional. Built-in baseline if omitted.
    ipv4: true                 # Optional. Built-in baseline if omitted.
  bgp:
    asn: 65101                 # This node's local ASN.
    peers:
      - address: 2001:db8::1   # Upstream router (IPv6 session).
        asn: 65000
      - address: 192.0.2.1     # Same router, IPv4 session.
        asn: 65000

`spec.defaults`

spec.defaults controls which address families a pod gets by default on this node — i.e. when the pod has no explicit flock.fritzlab.net/ipv6 or flock.fritzlab.net/ipv4 annotation. Pod annotations always override. If you omit spec.defaults (or any individual field inside it) flock falls back to its built-in baseline of dual-stack (IPv6 on, IPv4 on).

Goal	`spec.defaults`
Dual-stack (the default)	omit, or `{ ipv6: true, ipv4: true }`
IPv6-only node	`{ ipv6: true, ipv4: false }`
IPv4-only (legacy node)	`{ ipv6: false, ipv4: true }`

A NodeConfig that resolves to "neither family" is rejected at allocation time, so misconfiguring both to false will surface as an error on the first CNI ADD.

`spec.bgp`

Each peer becomes one BGP session. The agent picks a node-local source address on the same subnet as the peer; if there isn't one, BIRD uses its default. Multi-homing (multiple peers per family — or per upstream router pair) is allowed.

Pod annotations

All annotations live under flock.fritzlab.net/. Every annotation is optional; leave them off to inherit the per-node defaults.

Annotation	Type	Purpose
`flock.fritzlab.net/ipv6`	bool	Override `spec.defaults.ipv6` for this pod (`true`/`false`).
`flock.fritzlab.net/ipv4`	bool	Override `spec.defaults.ipv4` for this pod (`true`/`false`).
`flock.fritzlab.net/cidr6`	CIDRs	Restrict IPv6 allocation to a sub-range of the node's `cidr6`. Comma-separated.
`flock.fritzlab.net/cidr4`	CIDRs	Restrict IPv4 allocation to a sub-range of the node's `cidr4`. Comma-separated.
`flock.fritzlab.net/ip-algo`	list	Embed identity into the IPv6 IID. Subset of `namespace,pod,image`, in order, comma-separated.
`flock.fritzlab.net/anycast`	IPs	Bind these IPs on the pod's `lo`; advertise via BGP while pod is `Ready`. Mixed v6+v4 ok.
`flock.fritzlab.net/addresses`	IPs	Bind these IPs on the pod's `eth0`. The first v6 and first v4 replace IPAM allocation for that family — the addresses IP becomes the pod's primary IP. Mixed v6+v4 ok. Single-replica only in practice.

Bool values must be the literal strings "true" or "false" (case-insensitive, surrounding whitespace tolerated). Other values — 1, 0, yes, no — are rejected so a typo can't silently flip behaviour.

`addresses` vs `anycast`

Both annotations bind operator-supplied IPs onto a pod and have flock advertise /128 (or /32) per-pod over BGP. The differences are where the IP lands and what it's for:

	`anycast`	`addresses`
Bound on	pod `lo`	pod `eth0`
Multi-replica?	yes — every Ready replica advertises the same IP and the upstream router ECMPs across them	no — the same IP on multiple replicas is operator error
Replaces IPAM?	no — pod still has an IPAM-allocated unicast IP	yes — the first v6 + first v4 in the list become the pod's primary IPs in place of an IPAM allocation
Workload visibility	only the IPAM IP is on the primary interface	the public IP is `eth0`'s primary address — workloads that read their own NIC see it (e.g. Plex's remote-access detection)

Use anycast for shared services with many replicas (DNS, ingress). Use addresses when one specific pod needs a known public IP that the workload itself must see on its primary interface.

Conflict detection

addresses and anycast reject pods that supply an IP whose family is disabled. If the resolved WantV4 is false (via the pod's ipv4 annotation or the NodeConfig default) and any addresses- or anycast-supplied IP is IPv4, the CNI ADD fails with an explicit error. Same for v6. Both annotation types put IPs on a pod interface and rely on the family being enabled for return-path routing — silently accepting the IP would leave a non-functional pod.

Outside-aggregate advertisement

When an addresses IP replaces IPAM (becomes the pod's primary IP) the IP is typically outside the node's BGP aggregate (e.g. a public /32 on a node whose pod CIDR is private). flock notices this during BGP rendering and advertises the IP individually as a per-pod /32 or /128 so the upstream router has a route to it.

Example pods

Default dual-stack — no annotations needed:

apiVersion: v1
kind: Pod
metadata:
  name: minimal

IPv6 only — opt out of the default v4 allocation:

apiVersion: v1
kind: Pod
metadata:
  name: v6-only
  annotations:
    flock.fritzlab.net/ipv4: "false"

Operator-friendly addressing — fnv(namespace) | fnv(pod) | random packed into the host bits, so a pod's identity is recognisable from its IP in kubectl get pods -o wide:

metadata:
  annotations:
    flock.fritzlab.net/ip-algo: "namespace,pod"

Anycast service — three replicas, each advertising the same v6+v4 anycast pair from the node it lands on. The upstream router does ECMP across the active set:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns
spec:
  replicas: 3
  template:
    metadata:
      annotations:
        flock.fritzlab.net/anycast: "2001:db8:a::53, 192.0.2.53"
    spec:
      containers:
        - name: coredns
          image: coredns/coredns
          readinessProbe:
            httpGet: { path: /ready, port: 8181 }
            periodSeconds: 1
            failureThreshold: 1

Workload with a known public IP — single-replica pod whose application inspects its own primary interface (Plex's remote-access flow). The addresses become the pod's primary IPs in place of any IPAM allocation; the pod's eth0 ends up with exactly the supplied addresses, and BGP advertises them as a /128 and /32:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: plex
spec:
  replicas: 1
  template:
    metadata:
      annotations:
        flock.fritzlab.net/addresses: "2001:db8:c606::166, 192.0.2.166"
    spec:
      containers:
        - name: plex
          image: plexinc/pms-docker

Use cases

Highly-available DNS. Run N CoreDNS replicas, each annotated with the same anycast IP. Point client /etc/resolv.conf at the anycast address. Each replica advertises a /128 from its own node; the upstream router does ECMP. Lose a pod, traffic fails over within a probe cycle.

Replacing a kube-proxy ClusterIP. Headless Service plus an anycast IP gives you a single stable address with load-balancing across pods, without the DNAT-pinning that makes long-lived TCP keepalive connections stick to one backend forever. ECMP routes each new flow independently.

Per-pod public IPv6. Because every pod has a globally routable IPv6 address and the cluster does no NAT, a pod's eth0 IP is reachable from the rest of the internet (subject to your firewall). Useful for things like outgoing SMTP, where you want a stable from-address per pod, or for peer-to-peer protocols that don't tolerate NAT.

Fast pod identification in kubectl. With flock.fritzlab.net/ip-algo: namespace,pod the IPv6 host bits encode the pod's namespace+name, so you can recognise a pod from its IP without a lookup. Reverse-DNS via a wildcard zone makes those IPs human-readable too.

Static-IP migration. Annotation-driven address allocation means you can ask for a specific sub-CIDR (cidr6: 2001:db8:f001::ab00/120) for services that previously needed pinned IPs (mail server, ingress controller). When the static-IP requirement goes away, drop the annotation and the pod gets a normal allocation.

Comparison vs Calico / Cilium

	flock	Calico	Cilium
Default address family	dual (IPv6+IPv4)	IPv4	dual
BGP	yes (BIRD)	yes	optional
Overlay (VXLAN/IPIP)	never	optional	yes (geneve) or native
NAT in datapath	never	masquerade by default	masquerade by default
Anycast pod addressing	first-class	manual	optional, via service mesh
eBPF datapath	no	optional	yes
NetworkPolicy	yes (nftables)	yes (Felix)	yes (eBPF)
Cluster size target	small (< 100 nodes)	thousands	thousands
Operational surface area	low (1 DaemonSet, 1 CRD)	medium	high
Production-ready	alpha	yes	yes

flock is not trying to compete with Calico or Cilium. The right answer for most clusters is one of those two — flock exists for clusters where every node already speaks BGP, the operator wants real (no NAT) IPv6 addressing on every pod, and per-pod anycast is something they actually want to use rather than work around.

Limitations and non-goals

NetworkPolicy supports networking.k8s.io/v1 (ingress + egress, all three peer types, numeric ports + port ranges). Named ports and AdminNetworkPolicy are not yet implemented.
No NAT, no masquerade, no SNAT-egress. Pods reach the wider internet using their real cluster-routable addresses; if your IPv4 pool isn't routable beyond your network, those pods can't reach v4-only hosts on the public internet without help from your border router.
No multi-cluster, no peering across clusters.
Linux-only datapath.
IPAM is per-node — there's no global allocator and no IP mobility. When a pod moves to a different node it gets a new address.
The agent is privileged. It mounts /var/run/netns, configures veth pairs, manages kernel routes, and holds CAP_NET_ADMIN. This is inherent to being a CNI; reducing privilege further is not a goal.
If BIRD dies but the agent stays up, pods on that node stop being reachable from off-node. The DaemonSet liveness probes catch this.

Building and testing

# Unit tests + fuzz seed corpora (fast, ~1s):
go test ./...

# Targeted fuzz pass:
go test -run NEVERMATCH -fuzz=FuzzParseAnnotations -fuzztime=30s ./pkg/agent
go test -run NEVERMATCH -fuzz=FuzzRender           -fuzztime=30s ./pkg/routing/bird
go test -run NEVERMATCH -fuzz=FuzzEmbed            -fuzztime=30s ./pkg/embed
go test -run NEVERMATCH -fuzz=FuzzIPAM_Allocate    -fuzztime=30s ./pkg/agent

# Build the container image (used by the DaemonSet):
docker build -t flock:dev .

The fuzz tests are also run as plain unit tests via their seed corpora, so every go test ./... exercises the discovered edge cases as regressions.

pkg/agent has Linux-only files (*_linux.go) for netlink and netns work; the macOS/Windows build pulls in stubs from *_stub.go so tests run cleanly on developer laptops.

License

Apache 2.0 — see LICENSE.

README.md Unescape Escape

flock

Table of contents

How it works

Packet path

Requirements

Quickstart

NodeConfig CRD

spec.defaults

spec.bgp

Pod annotations

addresses vs anycast

Conflict detection

Outside-aggregate advertisement

Example pods

Use cases

Comparison vs Calico / Cilium

Limitations and non-goals

Building and testing

License

README.md

`spec.defaults`

`spec.bgp`

`addresses` vs `anycast`