Compare commits

...

2 Commits

Author SHA1 Message Date
Donavan Fritz 39ede9130b netpol: NetworkPolicy v1 enforcement via nftables
Build flock Image / build (push) Has been cancelled
New pkg/agent/netpol implementing standard networking.k8s.io/v1
NetworkPolicy. Pipeline:

  pods + policies + namespaces  →  Translate  →  Render  →  Apply

Supports ingress + egress, all three peer types (podSelector,
namespaceSelector, ipBlock with except), numeric ports + port ranges,
default-deny semantics derived from PolicyTypes (or inferred from
non-empty Spec.Egress when unset).

Apply path is `nft -f -` shell-out — single transaction, atomic, kernel
guarantees partial-failure rollback. Idempotent dedup via last-applied
script. Reconcile triggers: informer events, 30s self-heal tick, every
CNI ADD/DEL.

Verified against the three live cluster NetPols (calico-apiserver,
remote-proxies/lodge-home-assistant, storage/garage-admin-restrict).
Fuzz target stitches Translate + Render with random selector and peer
inputs; 21 unit tests cover the policy semantics.

Named ports skip with a warn — deferred until kubelet exposes them in a
form that doesn't require shadowing pod state.

Dockerfile: + nftables.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:25:58 -05:00
Donavan Fritz 71e584cf96 NodeConfig defaults + code-quality pass + fuzz tests + README
NodeConfig.Spec.Defaults adds per-node IPv6/IPv4 family defaults that pod
annotations can override; built-in baseline (v6=true, v4=false) still
applies when the field is omitted.

bird.Render now validates every operator-supplied value (peer addresses,
CIDRs, anycast IPs, source addresses) before templating — fuzz found a
peer address containing `}` produced unbalanced braces in bird.conf.
Failing input preserved as a regression seed.

Fuzz targets added for ParseAnnotations, ParseCNIArgs, HostIfaceName,
canonical, IPAM allocate sequences, embed.Embed, and bird.Render.
Hardened canonical/ipToU32 against nil and non-IPv4 inputs.

README rewritten for outside readers — quickstart, NodeConfig + annotation
reference with worked examples, anycast use cases, comparison vs Calico
and Cilium, requirements, limitations.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:25:45 -05:00
33 changed files with 4281 additions and 102 deletions
+1 -1
View File
@@ -21,7 +21,7 @@ RUN CGO_ENABLED=0 go build -trimpath \
-o /out/flock-installer ./cmd/flock-installer
FROM alpine:3.21
RUN apk add --no-cache iproute2 bird ca-certificates
RUN apk add --no-cache iproute2 bird nftables ca-certificates
COPY --from=build /out/flock /usr/local/bin/flock
COPY --from=build /out/flock-agent /usr/local/bin/flock-agent
COPY --from=build /out/flock-installer /usr/local/bin/flock-installer
+326 -13
View File
@@ -1,22 +1,335 @@
# flock
Kubernetes CNI for sjc001. Per-pod IPv4 opt-in, IID embedding, Ready-gated anycast via BGP.
A small, opinionated Kubernetes CNI built around three ideas:
Design doc: `k8s-manager/dfritz-cni.md` (in the operator's k8s-manager repo).
1. **IPv6-first.** Every pod gets a globally routable IPv6 address. IPv4 is
per-pod opt-in for legacy clients.
2. **No tunnels, no NAT.** Pod addresses are the real packets on the wire.
Each node speaks BGP to its upstream router and advertises its own
per-node prefix. The pod network is just the LAN, plus host routes.
3. **Anycast as a primitive.** A pod can request an anycast address via
an annotation; flock binds it on the pod's loopback and advertises a
`/128` (or `/32`) over BGP, but only while the pod is `Ready`. Multiple
replicas advertise the same address from different nodes for ECMP load
balancing without a separate Service or external LB.
Status: M1 scaffold. Not functional. See milestones table in the design doc.
flock is built for clusters where every node already speaks BGP to one
or more upstream routers. It deliberately leaves out features you'd
expect from a general-purpose CNI — overlays, IPsec/Wireguard, IPAM
coordination across nodes, kube-proxy integration — so the moving parts
that remain are easy to reason about.
## Layout
> **Status:** alpha. CRD shape and annotation keys may still change.
- `cmd/flock` — CNI plugin binary (kubelet-invoked)
- `cmd/flock-agent` — DaemonSet binary
- `pkg/api/v1alpha1``NodeConfig` CRD types
- `pkg/cni` — CNI plugin internals + RPC client
- `pkg/agent` — agent server, IPAM, state file, anycast, NetworkPolicy
- `pkg/embed``ip-algo` IID embedding (pure)
- `pkg/routing/{bird,ospf}` — routing backends
- `deploy/` — CRDs, RBAC, DaemonSet manifests
## Table of contents
- [How it works](#how-it-works)
- [Requirements](#requirements)
- [Quickstart](#quickstart)
- [NodeConfig CRD](#nodeconfig-crd)
- [Pod annotations](#pod-annotations)
- [Use cases](#use-cases)
- [Comparison vs Calico / Cilium](#comparison-vs-calico--cilium)
- [Limitations and non-goals](#limitations-and-non-goals)
- [Building and testing](#building-and-testing)
- [License](#license)
## How it works
Each node runs a single `flock-agent` DaemonSet pod with three containers:
- a privileged init container (`flock-installer`) that drops the CNI
plugin binary into `/opt/cni/bin/flock` and writes
`/etc/cni/net.d/01-flock.conflist`,
- the agent itself, which owns IPAM, programs veth pairs, and tracks
pod readiness, and
- a [BIRD2](https://bird.network.cz/) sidecar that the agent re-renders
and reloads when the per-node config or the active anycast set changes.
Each node has a `NodeConfig` CR (cluster-scoped, name = node name) that
declares its IPv6 and IPv4 prefixes, its local BGP ASN, and its upstream
peers. The agent reads the CR via a dynamic informer.
When kubelet runs the CNI plugin on `ADD`, the plugin opens a unix-socket
RPC to the agent. The agent allocates an address from the per-node
CIDRs, creates a veth pair, configures the pod side, persists the
allocation to `/var/lib/flock/allocations.json`, and returns the result.
There is no controller loop and no IPAM coordination across nodes — each
node owns a non-overlapping CIDR and allocates locally.
For anycast, the agent installs `<anycast-ip> via <pod-eth0-ip> dev <veth>`
host routes on the node and adds the anycast IP to BIRD's BGP export
filter. When a pod loses readiness, the agent withdraws the route from
both the kernel and BGP within one reconcile cycle (sub-second).
### Packet path
`pod.eth0` (a veth) ↔ host-side veth (with `addrgenmode none`,
`fe80::1/64`, proxy-ARP for the v4 default-via) ↔ host kernel ↔ uplink
NIC ↔ upstream router. No conntrack, no SNAT, no encapsulation.
For IPv6 the host side of every veth carries the deterministic link-local
gateway `fe80::1`, so every pod can use a fixed default route. For IPv4
the host side answers ARP for `169.254.1.1`, providing the same fixed
default route in v4.
## Requirements
- Linux nodes. flock has not been tested on, and does not target,
Windows nodes.
- Kubernetes ≥ 1.27.
- An upstream router (or pair) that accepts a BGP session from each
node. flock has been tested with Cisco IOS-XE, Arista EOS, and FRR
acting as the upstream; anything that speaks standard eBGP should work.
- Globally routable (or at least datacentre-routable) IPv6 prefix
delegated to the cluster, sliced into a per-node /64. IPv4 is
optional but supported.
- Each node must have a unique local ASN. Private ASNs (`6451265534`,
`42000000004294967294`) are typical.
## Quickstart
```sh
# 1. Install CRD + RBAC + DaemonSet (single bundled manifest):
kubectl apply -f deploy/install.yaml
# 2. Label the node(s) you want flock to manage:
kubectl label node <node-name> flock.fritzlab.net/agent=
# 3. Apply a NodeConfig CR for that node (see "NodeConfig CRD" below):
kubectl apply -f my-nodeconfig.yaml
# 4. Verify the agent is up:
kubectl -n kube-system get pod -l app=flock-agent -o wide
kubectl -n kube-system exec -it ds/flock-agent -c bird -- \
birdc -s /run/flock/bird.ctl show protocols
```
The DaemonSet is gated by the `flock.fritzlab.net/agent` node label, so
unlabelled nodes continue to use whatever CNI was installed before. This
lets you migrate node-by-node — start with one node, prove it works, then
proceed.
## NodeConfig CRD
A `NodeConfig` is the only operator-supplied input. One per node, name
matches the node name. Example:
```yaml
apiVersion: flock.fritzlab.net/v1alpha1
kind: NodeConfig
metadata:
name: node-a
spec:
cidr6:
- 2001:db8:f001::/64 # Pods on this node get addresses from here.
cidr4:
- 192.0.2.0/24 # IPv4 pool, used only when a pod opts in.
defaults:
ipv6: true # Optional. Built-in baseline if omitted.
ipv4: false # Optional. Built-in baseline if omitted.
bgp:
asn: 65101 # This node's local ASN.
peers:
- address: 2001:db8::1 # Upstream router (IPv6 session).
asn: 65000
- address: 192.0.2.1 # Same router, IPv4 session.
asn: 65000
```
### `spec.defaults`
`spec.defaults` controls which address families a pod *gets by default*
on this node — i.e. when the pod has no explicit `flock.fritzlab.net/ipv6`
or `flock.fritzlab.net/ipv4` annotation. Pod annotations always override.
If you omit `spec.defaults` (or any individual field inside it) flock
falls back to its built-in baseline of **IPv6 on, IPv4 off**.
| Goal | `spec.defaults` |
|---------------------------|----------------------------------------|
| IPv6-only (the default) | omit, or `{ ipv6: true, ipv4: false }`|
| Dual-stack by default | `{ ipv6: true, ipv4: true }` |
| IPv4-only (legacy node) | `{ ipv6: false, ipv4: true }` |
A NodeConfig that resolves to "neither family" is rejected at allocation
time, so misconfiguring both to false will surface as an error on the
first `CNI ADD`.
### `spec.bgp`
Each `peer` becomes one BGP session. The agent picks a node-local source
address on the same subnet as the peer; if there isn't one, BIRD uses
its default. Multi-homing (multiple peers per family — or per upstream
router pair) is allowed.
## Pod annotations
All annotations live under `flock.fritzlab.net/`. Every annotation is
optional; leave them off to inherit the per-node defaults.
| Annotation | Type | Purpose |
|-------------------------------------|--------|-----------------------------------------------------------------------------------------------|
| `flock.fritzlab.net/ipv6` | bool | Override `spec.defaults.ipv6` for this pod (`true`/`false`). |
| `flock.fritzlab.net/ipv4` | bool | Override `spec.defaults.ipv4` for this pod (`true`/`false`). |
| `flock.fritzlab.net/cidr6` | CIDRs | Restrict IPv6 allocation to a sub-range of the node's `cidr6`. Comma-separated. |
| `flock.fritzlab.net/cidr4` | CIDRs | Restrict IPv4 allocation to a sub-range of the node's `cidr4`. Comma-separated. |
| `flock.fritzlab.net/ip-algo` | list | Embed identity into the IPv6 IID. Subset of `namespace,pod,image`, in order, comma-separated. |
| `flock.fritzlab.net/anycast` | IPs | Bind these IPs on the pod's `lo`; advertise via BGP while pod is `Ready`. Mixed v6+v4 ok. |
Bool values must be the literal strings `"true"` or `"false"`
(case-insensitive, surrounding whitespace tolerated). Other values —
`1`, `0`, `yes`, `no` — are rejected so a typo can't silently flip
behaviour.
### Example pods
Default IPv6-only — no annotations needed:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: minimal
```
Dual-stack on a node whose default is IPv6-only:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: legacy-client
annotations:
flock.fritzlab.net/ipv4: "true"
```
Operator-friendly addressing — `fnv(namespace) | fnv(pod) | random`
packed into the host bits, so a pod's identity is recognisable from
its IP in `kubectl get pods -o wide`:
```yaml
metadata:
annotations:
flock.fritzlab.net/ip-algo: "namespace,pod"
```
Anycast service — three replicas, each advertising the same v6+v4
anycast pair from the node it lands on. The upstream router does ECMP
across the active set:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns
spec:
replicas: 3
template:
metadata:
annotations:
flock.fritzlab.net/ipv4: "true"
flock.fritzlab.net/anycast: "2001:db8:a::53, 192.0.2.53"
spec:
containers:
- name: coredns
image: coredns/coredns
readinessProbe:
httpGet: { path: /ready, port: 8181 }
periodSeconds: 1
failureThreshold: 1
```
## Use cases
**Highly-available DNS.** Run N CoreDNS replicas, each annotated with
the same `anycast` IP. Point client `/etc/resolv.conf` at the anycast
address. Each replica advertises a `/128` from its own node; the
upstream router does ECMP. Lose a pod, traffic fails over within a
probe cycle.
**Replacing a kube-proxy `ClusterIP`.** Headless Service plus an anycast
IP gives you a single stable address with load-balancing across pods,
without the DNAT-pinning that makes long-lived TCP keepalive connections
stick to one backend forever. ECMP routes each new flow independently.
**Per-pod public IPv6.** Because every pod has a globally routable IPv6
address and the cluster does no NAT, a pod's `eth0` IP is reachable from
the rest of the internet (subject to your firewall). Useful for things
like outgoing SMTP, where you want a stable from-address per pod, or for
peer-to-peer protocols that don't tolerate NAT.
**Fast pod identification in `kubectl`.** With
`flock.fritzlab.net/ip-algo: namespace,pod` the IPv6 host bits encode
the pod's namespace+name, so you can recognise a pod from its IP without
a lookup. Reverse-DNS via a wildcard zone makes those IPs human-readable
too.
**Static-IP migration.** Annotation-driven address allocation means you
can ask for a specific sub-CIDR (`cidr6: 2001:db8:f001::ab00/120`) for
services that previously needed pinned IPs (mail server, ingress
controller). When the static-IP requirement goes away, drop the
annotation and the pod gets a normal allocation.
## Comparison vs Calico / Cilium
| | flock | Calico | Cilium |
|--------------------------|-----------------------------|------------------------------|------------------------------|
| Default address family | IPv6 | IPv4 | dual |
| BGP | yes (BIRD) | yes | optional |
| Overlay (VXLAN/IPIP) | never | optional | yes (geneve) or native |
| NAT in datapath | never | masquerade by default | masquerade by default |
| Anycast pod addressing | first-class | manual | optional, via service mesh |
| eBPF datapath | no | optional | yes |
| NetworkPolicy | not yet | yes (Felix) | yes (eBPF) |
| Cluster size target | small (< 100 nodes) | thousands | thousands |
| Operational surface area | low (1 DaemonSet, 1 CRD) | medium | high |
| Production-ready | alpha | yes | yes |
flock is not trying to compete with Calico or Cilium. The right answer
for most clusters is one of those two — flock exists for clusters where
every node already speaks BGP, the operator wants to think in IPv6-first
terms, and per-pod anycast is something they actually want to use rather
than work around.
## Limitations and non-goals
- No NetworkPolicy enforcement yet (planned).
- No NAT, no masquerade, no SNAT-egress. If your pods need to reach a
legacy IPv4-only destination, give them an IPv4 address explicitly.
- No multi-cluster, no peering across clusters.
- Linux-only datapath.
- IPAM is per-node — there's no global allocator and no IP mobility.
When a pod moves to a different node it gets a new address.
- The agent is privileged. It mounts `/var/run/netns`, configures veth
pairs, manages kernel routes, and holds `CAP_NET_ADMIN`. This is
inherent to being a CNI; reducing privilege further is not a goal.
- If BIRD dies but the agent stays up, pods on that node stop being
reachable from off-node. The DaemonSet liveness probes catch this.
## Building and testing
```sh
# Unit tests + fuzz seed corpora (fast, ~1s):
go test ./...
# Targeted fuzz pass:
go test -run NEVERMATCH -fuzz=FuzzParseAnnotations -fuzztime=30s ./pkg/agent
go test -run NEVERMATCH -fuzz=FuzzRender -fuzztime=30s ./pkg/routing/bird
go test -run NEVERMATCH -fuzz=FuzzEmbed -fuzztime=30s ./pkg/embed
go test -run NEVERMATCH -fuzz=FuzzIPAM_Allocate -fuzztime=30s ./pkg/agent
# Build the container image (used by the DaemonSet):
docker build -t flock:dev .
```
The fuzz tests are also run as plain unit tests via their seed corpora,
so every `go test ./...` exercises the discovered edge cases as
regressions.
`pkg/agent` has Linux-only files (`*_linux.go`) for netlink and netns
work; the macOS/Windows build pulls in stubs from `*_stub.go` so tests
run cleanly on developer laptops.
## License
Apache 2.0.
Apache 2.0 — see [LICENSE](LICENSE).
@@ -20,6 +20,9 @@ spec:
openAPIV3Schema:
type: object
required: [spec]
description: |
NodeConfig is the per-node operator-supplied configuration for the
flock CNI agent. Its name MUST equal the Kubernetes node name.
properties:
spec:
type: object
@@ -35,6 +38,25 @@ spec:
items:
type: string
description: IPv4 CIDR owned and aggregate-advertised by this node.
defaults:
type: object
description: |
Per-node baseline for which address families a pod receives
when its own annotations don't specify. Pod annotations
flock.fritzlab.net/ipv6 and flock.fritzlab.net/ipv4 always
override these defaults. Built-in fallback (when this block
or any field is omitted) is IPv6=true, IPv4=false.
properties:
ipv6:
type: boolean
description: |
Default IPv6 inclusion for pods on this node. Omit to
inherit the built-in baseline (true).
ipv4:
type: boolean
description: |
Default IPv4 inclusion for pods on this node. Omit to
inherit the built-in baseline (false).
bgp:
type: object
required: [asn, peers]
@@ -70,3 +92,9 @@ spec:
- name: CIDR4
type: string
jsonPath: .spec.cidr4
- name: DefV6
type: boolean
jsonPath: .spec.defaults.ipv6
- name: DefV4
type: boolean
jsonPath: .spec.defaults.ipv4
+28
View File
@@ -20,6 +20,9 @@ spec:
openAPIV3Schema:
type: object
required: [spec]
description: |
NodeConfig is the per-node operator-supplied configuration for the
flock CNI agent. Its name MUST equal the Kubernetes node name.
properties:
spec:
type: object
@@ -35,6 +38,25 @@ spec:
items:
type: string
description: IPv4 CIDR owned and aggregate-advertised by this node.
defaults:
type: object
description: |
Per-node baseline for which address families a pod receives
when its own annotations don't specify. Pod annotations
flock.fritzlab.net/ipv6 and flock.fritzlab.net/ipv4 always
override these defaults. Built-in fallback (when this block
or any field is omitted) is IPv6=true, IPv4=false.
properties:
ipv6:
type: boolean
description: |
Default IPv6 inclusion for pods on this node. Omit to
inherit the built-in baseline (true).
ipv4:
type: boolean
description: |
Default IPv4 inclusion for pods on this node. Omit to
inherit the built-in baseline (false).
bgp:
type: object
required: [asn, peers]
@@ -70,6 +92,12 @@ spec:
- name: CIDR4
type: string
jsonPath: .spec.cidr4
- name: DefV6
type: boolean
jsonPath: .spec.defaults.ipv6
- name: DefV4
type: boolean
jsonPath: .spec.defaults.ipv4
---
apiVersion: v1
kind: ServiceAccount
+168 -39
View File
@@ -5,77 +5,153 @@ import (
"net"
"strings"
flockv1alpha1 "code.fritzlab.net/fritzlab/flock/pkg/api/v1alpha1"
"code.fritzlab.net/fritzlab/flock/pkg/embed"
)
// annotationPrefix is the namespace under which all flock pod annotations
// live. Anything not starting with this prefix is ignored by the parser.
const annotationPrefix = "flock.fritzlab.net/"
// ParsedAnnotations is the typed view of a Pod's flock annotations.
type ParsedAnnotations struct {
// Recognised annotation keys (without the prefix).
const (
annIPv6 = "ipv6"
annIPv4 = "ipv4"
annCIDR6 = "cidr6"
annCIDR4 = "cidr4"
annIPAlgo = "ip-algo"
annAnycast = "anycast"
)
// FamilyDefaults is the per-call baseline for whether a pod receives an IPv6
// and/or IPv4 address. It is the merge of:
//
// 1. flock's built-in baseline (IPv6=true, IPv4=false), then
// 2. any NodeConfig.Spec.Defaults override the operator has applied to
// the local node.
//
// Pod-level `flock.fritzlab.net/ipv{6,4}` annotations override this baseline.
//
// Use FamilyDefaultsFromNodeConfig to compute a value from a NodeConfig,
// or BuiltinFamilyDefaults() if no NodeConfig is in scope.
type FamilyDefaults struct {
// WantV6 is the default-on value for IPv6 inclusion when the pod has no
// explicit ipv6 annotation.
WantV6 bool
// WantV4 is the default-on value for IPv4 inclusion when the pod has no
// explicit ipv4 annotation.
WantV4 bool
}
// BuiltinFamilyDefaults returns flock's hard-coded fallback: IPv6 only.
// This is the policy applied when no NodeConfig override is in effect.
//
// We define it as a function rather than a var so callers can't mutate the
// shared baseline at runtime.
func BuiltinFamilyDefaults() FamilyDefaults {
return FamilyDefaults{WantV6: true, WantV4: false}
}
// FamilyDefaultsFromNodeConfig resolves the effective per-node defaults,
// falling back to BuiltinFamilyDefaults for any field the NodeConfig leaves
// unset. A nil NodeConfig (or nil Spec.Defaults) returns the built-in
// baseline unchanged.
func FamilyDefaultsFromNodeConfig(nc *flockv1alpha1.NodeConfig) FamilyDefaults {
out := BuiltinFamilyDefaults()
if nc == nil || nc.Spec.Defaults == nil {
return out
}
if nc.Spec.Defaults.IPv6 != nil {
out.WantV6 = *nc.Spec.Defaults.IPv6
}
if nc.Spec.Defaults.IPv4 != nil {
out.WantV4 = *nc.Spec.Defaults.IPv4
}
return out
}
// ParsedAnnotations is the typed view of a pod's flock annotations after the
// node-level defaults have been merged in. All slices are non-nil only when
// the corresponding annotation was present and parsed cleanly.
type ParsedAnnotations struct {
// WantV6 is true when the pod should receive an IPv6 address.
WantV6 bool
// WantV4 is true when the pod should receive an IPv4 address.
WantV4 bool
// CIDR6 narrows IPv6 allocation to specific operator-approved sub-ranges
// of the node's CIDR6 set. nil/empty means "use any node CIDR6".
CIDR6 []*net.IPNet
// CIDR4 narrows IPv4 allocation. nil/empty means "use any node CIDR4".
CIDR4 []*net.IPNet
// IPAlgo is the ordered list of identity fields used to build the IID.
// nil/empty means "random IID".
IPAlgo []embed.Field
// Anycast is the set of anycast IPs to bind on the pod's loopback.
// nil/empty means "no anycast".
Anycast []net.IP
}
// ParseAnnotations applies the design-doc defaults (ipv6=true, ipv4=false)
// and validates the post-merge combination.
func ParseAnnotations(in map[string]string) (*ParsedAnnotations, error) {
out := &ParsedAnnotations{WantV6: true, WantV4: false}
// ParseAnnotations applies the supplied per-node defaults and validates the
// post-merge combination. It is pure — it does not consult NodeConfig or any
// global state — so it is safe to call from tests and fuzz targets.
//
// Annotation precedence: pod annotation > FamilyDefaults > built-in baseline.
// Callers compute FamilyDefaults via FamilyDefaultsFromNodeConfig and pass it
// in.
//
// Errors:
// - any unknown ipv6/ipv4 value (must be "true" or "false", case-insensitive)
// - any malformed cidr6/cidr4/anycast/ip-algo value
// - the post-merge combination resolves to neither IPv6 nor IPv4 (a pod
// must have at least one address)
func ParseAnnotations(in map[string]string, defaults FamilyDefaults) (*ParsedAnnotations, error) {
out := &ParsedAnnotations{WantV6: defaults.WantV6, WantV4: defaults.WantV4}
if v, ok := in[annotationPrefix+"ipv6"]; ok {
switch strings.ToLower(strings.TrimSpace(v)) {
case "true":
out.WantV6 = true
case "false":
out.WantV6 = false
default:
return nil, fmt.Errorf("annotation ipv6=%q: must be true or false", v)
if v, ok := in[annotationPrefix+annIPv6]; ok {
b, err := parseBoolAnnotation(annIPv6, v)
if err != nil {
return nil, err
}
out.WantV6 = b
}
if v, ok := in[annotationPrefix+"ipv4"]; ok {
switch strings.ToLower(strings.TrimSpace(v)) {
case "true":
out.WantV4 = true
case "false":
out.WantV4 = false
default:
return nil, fmt.Errorf("annotation ipv4=%q: must be true or false", v)
if v, ok := in[annotationPrefix+annIPv4]; ok {
b, err := parseBoolAnnotation(annIPv4, v)
if err != nil {
return nil, err
}
out.WantV4 = b
}
if !out.WantV6 && !out.WantV4 {
return nil, fmt.Errorf("ipv6=false requires ipv4=true (pod must have at least one address)")
return nil, fmt.Errorf("annotations + defaults resolve to no address family (need at least one of ipv6/ipv4)")
}
if v, ok := in[annotationPrefix+"cidr6"]; ok {
nets, err := parseCIDRList(v)
if v, ok := in[annotationPrefix+annCIDR6]; ok {
nets, err := parseCIDRList(v, familyV6)
if err != nil {
return nil, fmt.Errorf("annotation cidr6: %w", err)
return nil, fmt.Errorf("annotation %s: %w", annCIDR6, err)
}
out.CIDR6 = nets
}
if v, ok := in[annotationPrefix+"cidr4"]; ok {
nets, err := parseCIDRList(v)
if v, ok := in[annotationPrefix+annCIDR4]; ok {
nets, err := parseCIDRList(v, familyV4)
if err != nil {
return nil, fmt.Errorf("annotation cidr4: %w", err)
return nil, fmt.Errorf("annotation %s: %w", annCIDR4, err)
}
out.CIDR4 = nets
}
if v, ok := in[annotationPrefix+"ip-algo"]; ok {
if v, ok := in[annotationPrefix+annIPAlgo]; ok {
fields, err := parseIPAlgo(v)
if err != nil {
return nil, fmt.Errorf("annotation ip-algo: %w", err)
return nil, fmt.Errorf("annotation %s: %w", annIPAlgo, err)
}
out.IPAlgo = fields
}
if v, ok := in[annotationPrefix+"anycast"]; ok {
if v, ok := in[annotationPrefix+annAnycast]; ok {
ips, err := parseIPList(v)
if err != nil {
return nil, fmt.Errorf("annotation anycast: %w", err)
return nil, fmt.Errorf("annotation %s: %w", annAnycast, err)
}
out.Anycast = ips
}
@@ -83,7 +159,39 @@ func ParseAnnotations(in map[string]string) (*ParsedAnnotations, error) {
return out, nil
}
func parseCIDRList(s string) ([]*net.IPNet, error) {
// parseBoolAnnotation accepts only "true" or "false" (case-insensitive,
// surrounding whitespace tolerated). All other values — including "1", "0",
// "yes", "no" — are rejected so operator typos are caught loudly rather
// than silently producing the "false" default.
func parseBoolAnnotation(key, v string) (bool, error) {
switch strings.ToLower(strings.TrimSpace(v)) {
case "true":
return true, nil
case "false":
return false, nil
default:
return false, fmt.Errorf("annotation %s=%q: must be \"true\" or \"false\"", key, v)
}
}
// addressFamily distinguishes IPv6 vs IPv4 in places where the parser must
// validate the family of supplied CIDRs.
type addressFamily int
const (
familyAny addressFamily = iota
familyV6
familyV4
)
// parseCIDRList parses a comma-separated CIDR list. Whitespace around items
// is trimmed; empty items are silently dropped. The list must contain at
// least one entry post-trim.
//
// If `want` is familyV6 or familyV4 each entry's family is checked and a
// mismatch is reported, so an `flock.fritzlab.net/cidr6` annotation cannot
// silently slip a v4 prefix into the v6 allocator.
func parseCIDRList(s string, want addressFamily) ([]*net.IPNet, error) {
var out []*net.IPNet
for _, part := range strings.Split(s, ",") {
part = strings.TrimSpace(part)
@@ -94,6 +202,17 @@ func parseCIDRList(s string) ([]*net.IPNet, error) {
if err != nil {
return nil, fmt.Errorf("invalid CIDR %q: %w", part, err)
}
isV4 := n.IP.To4() != nil
switch want {
case familyV6:
if isV4 {
return nil, fmt.Errorf("CIDR %q is IPv4, expected IPv6", part)
}
case familyV4:
if !isV4 {
return nil, fmt.Errorf("CIDR %q is IPv6, expected IPv4", part)
}
}
out = append(out, n)
}
if len(out) == 0 {
@@ -102,6 +221,9 @@ func parseCIDRList(s string) ([]*net.IPNet, error) {
return out, nil
}
// parseIPList parses a comma-separated literal-IP list. Same trim/empty
// semantics as parseCIDRList. Mixed v4 and v6 entries are allowed (anycast
// pods can advertise both families together).
func parseIPList(s string) ([]net.IP, error) {
var out []net.IP
for _, part := range strings.Split(s, ",") {
@@ -121,6 +243,9 @@ func parseIPList(s string) ([]net.IP, error) {
return out, nil
}
// parseIPAlgo parses the ip-algo annotation. Each comma-separated token must
// match one of: namespace, pod, image. Empty tokens are dropped; unknown
// tokens are reported.
func parseIPAlgo(s string) ([]embed.Field, error) {
var out []embed.Field
for _, part := range strings.Split(s, ",") {
@@ -128,11 +253,11 @@ func parseIPAlgo(s string) ([]embed.Field, error) {
switch part {
case "":
continue
case "namespace":
case string(embed.FieldNamespace):
out = append(out, embed.FieldNamespace)
case "pod":
case string(embed.FieldPod):
out = append(out, embed.FieldPod)
case "image":
case string(embed.FieldImage):
out = append(out, embed.FieldImage)
default:
return nil, fmt.Errorf("unknown ip-algo field %q (allowed: namespace, pod, image)", part)
@@ -144,8 +269,8 @@ func parseIPAlgo(s string) ([]embed.Field, error) {
return out, nil
}
// CNIArgs parses the K=V;K=V CNI_ARGS string for the kubelet keys we care
// about. Other keys are ignored.
// CNIArgs is the typed view of the K=V;K=V CNI_ARGS string passed by kubelet.
// We only keep the fields the agent uses; unknown keys are ignored.
type CNIArgs struct {
PodNamespace string
PodName string
@@ -153,6 +278,10 @@ type CNIArgs struct {
InfraID string
}
// ParseCNIArgs is permissive by design — kubelet versions and runtime
// shims pass varying sets of keys. Malformed entries are skipped silently
// rather than failing the whole ADD; required-key validation is the
// caller's responsibility.
func ParseCNIArgs(s string) CNIArgs {
var a CNIArgs
for _, kv := range strings.Split(s, ";") {
+156
View File
@@ -0,0 +1,156 @@
package agent
import (
"testing"
)
// FuzzParseAnnotations explores the joint space of {ipv6, ipv4, cidr6, cidr4,
// ip-algo, anycast} annotations with random byte strings. Every recognised
// key is exercised by deriving a deterministic input map from the fuzzed
// bytes; this gives the fuzzer reach into all parser branches at once.
//
// Properties checked:
//
// 1. The parser never panics on any input.
// 2. On nil-error return, the result satisfies the design-doc invariant
// that at least one of WantV6 / WantV4 is true (a pod always has at
// least one address).
// 3. Anycast IPs and IPAlgo fields are non-nil/empty only when the
// annotation was supplied; never spontaneously populated.
//
// Seed corpus covers known edge cases the spec must handle.
func FuzzParseAnnotations(f *testing.F) {
// Seeds: each entry is six strings — the literal raw values for the
// six parsed keys. Empty string for "key absent".
type seed struct {
ipv6, ipv4, cidr6, cidr4, ipAlgo, anycast string
}
seeds := []seed{
{},
{ipv4: "true"},
{ipv6: "false", ipv4: "true"},
{ipv6: "TRUE"},
{ipv6: " true "},
{ipv6: "yes"}, // invalid → expect error
{ipv4: "1"}, // invalid
{cidr6: ""}, // invalid (empty after split)
{cidr6: ","}, // invalid (empty after trim)
{cidr6: "2602:817:3000:f001::/64"}, // valid single
{cidr6: "2602:817:3000:f001::/64,"}, // trailing comma
{cidr6: " 2602:817:3000:f001::/64 "}, // surrounding whitespace
{cidr6: "2602:817:3000:f001::/64, 2602:817:3000:f002::/64"},
{cidr6: "10.0.0.0/8"}, // family mismatch
{cidr4: "172.25.210.0/24"}, // valid
{cidr4: "172.25.210.0/24,172.25.211.0/24"}, // multiple
{cidr4: "2602:817::/32"}, // family mismatch
{ipAlgo: "namespace,pod,image"},
{ipAlgo: "namespace, pod , image"}, // whitespace
{ipAlgo: "namespace,unknown"}, // invalid
{ipAlgo: ""}, // invalid (empty)
{ipAlgo: ","}, // invalid
{anycast: "2602:817:3000:ac::1"},
{anycast: "2602:817:3000:ac::1, 172.25.255.1"},
{anycast: "::1"}, // loopback (allowed at parse time)
{anycast: "fe80::1"}, // link-local (allowed at parse time)
{anycast: "::ffff:10.0.0.1"}, // v4-mapped v6
{anycast: "0.0.0.0"}, // unspecified
{anycast: "definitely-not-an-ip"}, // invalid
{anycast: ""}, // invalid
// Embedded NUL bytes
{ipv4: "true\x00"},
{cidr6: "2602:817:3000:f001::/64\x00"},
{anycast: "\x00\x00"},
// Unicode
{ipv4: "trüe"},
{ipAlgo: "námespace"},
// Very long
{cidr6: longString("2602:817:3000:f001::/64,", 4096)},
}
for _, s := range seeds {
f.Add(s.ipv6, s.ipv4, s.cidr6, s.cidr4, s.ipAlgo, s.anycast)
}
f.Fuzz(func(t *testing.T, ipv6, ipv4, cidr6, cidr4, ipAlgo, anycast string) {
in := map[string]string{}
// Treat empty as "key absent" so the seed table matches the run-time
// shape; Kubernetes annotations cannot have a nil value but they CAN
// be missing entirely. Empty-string-with-key is also a real case
// (operator typo); add a separate seed below to cover it.
if ipv6 != "" {
in[annotationPrefix+annIPv6] = ipv6
}
if ipv4 != "" {
in[annotationPrefix+annIPv4] = ipv4
}
if cidr6 != "" {
in[annotationPrefix+annCIDR6] = cidr6
}
if cidr4 != "" {
in[annotationPrefix+annCIDR4] = cidr4
}
if ipAlgo != "" {
in[annotationPrefix+annIPAlgo] = ipAlgo
}
if anycast != "" {
in[annotationPrefix+annAnycast] = anycast
}
got, err := ParseAnnotations(in, BuiltinFamilyDefaults())
if err != nil {
return // any error is acceptable; we only require no panic
}
// Property: at least one family must be selected.
if !got.WantV6 && !got.WantV4 {
t.Fatalf("parser accepted but produced no family: in=%#v", in)
}
// Property: optional fields populated only when their key was set.
if _, hasAlgo := in[annotationPrefix+annIPAlgo]; !hasAlgo && len(got.IPAlgo) != 0 {
t.Fatalf("IPAlgo populated without annotation")
}
if _, hasAny := in[annotationPrefix+annAnycast]; !hasAny && len(got.Anycast) != 0 {
t.Fatalf("Anycast populated without annotation")
}
if _, hasC6 := in[annotationPrefix+annCIDR6]; !hasC6 && len(got.CIDR6) != 0 {
t.Fatalf("CIDR6 populated without annotation")
}
if _, hasC4 := in[annotationPrefix+annCIDR4]; !hasC4 && len(got.CIDR4) != 0 {
t.Fatalf("CIDR4 populated without annotation")
}
})
}
// FuzzParseCNIArgs requires the parser to never panic on adversarial inputs.
// The parser is permissive by spec — it returns a CNIArgs with whatever it
// could extract — so the only invariant is "doesn't crash".
func FuzzParseCNIArgs(f *testing.F) {
f.Add("")
f.Add("=")
f.Add(";")
f.Add(";=;=;")
f.Add("K8S_POD_NAMESPACE=ns;K8S_POD_NAME=p")
f.Add("K8S_POD_NAMESPACE=ns;K8S_POD_NAME=p;K8S_POD_UID=abc;K8S_POD_INFRA_CONTAINER_ID=def")
f.Add("=value-only")
f.Add("key-only=")
f.Add("\x00\x00\x00")
f.Add("K8S_POD_NAMESPACE=\xff\xfe\xfd")
f.Add("K8S_POD_NAME=value;K8S_POD_NAME=other") // duplicate keys: last wins
// Long input
f.Add(longString("K8S_POD_NAME=x;", 4096))
f.Fuzz(func(t *testing.T, in string) {
_ = ParseCNIArgs(in)
})
}
// longString returns s repeated to total >= n bytes, useful for piling up
// realistic-looking but oversized inputs.
func longString(s string, n int) string {
if len(s) == 0 {
return ""
}
var b []byte
for len(b) < n {
b = append(b, s...)
}
return string(b)
}
+172 -7
View File
@@ -3,11 +3,68 @@ package agent
import (
"testing"
flockv1alpha1 "code.fritzlab.net/fritzlab/flock/pkg/api/v1alpha1"
"code.fritzlab.net/fritzlab/flock/pkg/embed"
)
func TestParseAnnotations_Defaults(t *testing.T) {
a, err := ParseAnnotations(nil)
// boolPtr returns a pointer to b — convenient for the *bool pointer fields
// in FamilyDefaults where nil means "unset".
func boolPtr(b bool) *bool { return &b }
func TestBuiltinFamilyDefaults(t *testing.T) {
d := BuiltinFamilyDefaults()
if !d.WantV6 || d.WantV4 {
t.Fatalf("built-in defaults wrong: v6=%v v4=%v (want true/false)", d.WantV6, d.WantV4)
}
}
func TestFamilyDefaultsFromNodeConfig_NilNodeConfig(t *testing.T) {
d := FamilyDefaultsFromNodeConfig(nil)
if d != BuiltinFamilyDefaults() {
t.Fatalf("nil NodeConfig should yield built-in defaults; got %+v", d)
}
}
func TestFamilyDefaultsFromNodeConfig_NilDefaults(t *testing.T) {
nc := &flockv1alpha1.NodeConfig{}
d := FamilyDefaultsFromNodeConfig(nc)
if d != BuiltinFamilyDefaults() {
t.Fatalf("missing Defaults should yield built-in; got %+v", d)
}
}
func TestFamilyDefaultsFromNodeConfig_PartialOverride(t *testing.T) {
nc := &flockv1alpha1.NodeConfig{
Spec: flockv1alpha1.NodeConfigSpec{
Defaults: &flockv1alpha1.FamilyDefaults{
IPv4: boolPtr(true),
},
},
}
d := FamilyDefaultsFromNodeConfig(nc)
// IPv6 was unset → keeps built-in true; IPv4 was set → flipped on.
if !d.WantV6 || !d.WantV4 {
t.Fatalf("partial override wrong: %+v (want v6=true, v4=true)", d)
}
}
func TestFamilyDefaultsFromNodeConfig_FullOverride(t *testing.T) {
nc := &flockv1alpha1.NodeConfig{
Spec: flockv1alpha1.NodeConfigSpec{
Defaults: &flockv1alpha1.FamilyDefaults{
IPv6: boolPtr(false),
IPv4: boolPtr(true),
},
},
}
d := FamilyDefaultsFromNodeConfig(nc)
if d.WantV6 || !d.WantV4 {
t.Fatalf("full override wrong: %+v (want v6=false, v4=true)", d)
}
}
func TestParseAnnotations_BuiltinDefaults(t *testing.T) {
a, err := ParseAnnotations(nil, BuiltinFamilyDefaults())
if err != nil {
t.Fatal(err)
}
@@ -16,10 +73,36 @@ func TestParseAnnotations_Defaults(t *testing.T) {
}
}
func TestParseAnnotations_DualStack(t *testing.T) {
func TestParseAnnotations_NodeDefaultsApplied(t *testing.T) {
// Node config says "IPv4 is on by default for this node".
d := FamilyDefaults{WantV6: true, WantV4: true}
a, err := ParseAnnotations(nil, d)
if err != nil {
t.Fatal(err)
}
if !a.WantV6 || !a.WantV4 {
t.Fatalf("node defaults not applied: %+v", a)
}
}
func TestParseAnnotations_AnnotationOverridesNodeDefault(t *testing.T) {
// Node says dual-stack by default; pod opts out of v4 explicitly.
d := FamilyDefaults{WantV6: true, WantV4: true}
a, err := ParseAnnotations(map[string]string{
annotationPrefix + "ipv4": "false",
}, d)
if err != nil {
t.Fatal(err)
}
if !a.WantV6 || a.WantV4 {
t.Fatalf("annotation override failed: %+v", a)
}
}
func TestParseAnnotations_DualStackViaAnnotation(t *testing.T) {
a, err := ParseAnnotations(map[string]string{
annotationPrefix + "ipv4": "true",
})
}, BuiltinFamilyDefaults())
if err != nil {
t.Fatal(err)
}
@@ -31,15 +114,49 @@ func TestParseAnnotations_DualStack(t *testing.T) {
func TestParseAnnotations_NoFamily(t *testing.T) {
if _, err := ParseAnnotations(map[string]string{
annotationPrefix + "ipv6": "false",
}); err == nil {
}, BuiltinFamilyDefaults()); err == nil {
t.Fatalf("expected error: ipv6=false ipv4=false")
}
}
func TestParseAnnotations_NoFamily_NodeDefaultsAlsoOff(t *testing.T) {
// Pathological NodeConfig that disables both families. Even with no pod
// annotation we must reject — otherwise a pod gets an empty allocation.
d := FamilyDefaults{WantV6: false, WantV4: false}
if _, err := ParseAnnotations(nil, d); err == nil {
t.Fatalf("expected error when both defaults are false")
}
}
func TestParseAnnotations_BoolStrictness(t *testing.T) {
// Common misuses that should be rejected so typos don't silently flip
// behaviour to the implicit-false default.
bad := []string{"1", "0", "yes", "no", "TrueFalse", " "}
for _, v := range bad {
_, err := ParseAnnotations(map[string]string{
annotationPrefix + "ipv4": v,
}, BuiltinFamilyDefaults())
if err == nil {
t.Errorf("expected error for ipv4=%q", v)
}
}
}
func TestParseAnnotations_BoolCaseInsensitive(t *testing.T) {
for _, v := range []string{"TRUE", "True", " true ", "FALSE", "False"} {
_, err := ParseAnnotations(map[string]string{
annotationPrefix + "ipv4": v,
}, BuiltinFamilyDefaults())
if err != nil {
t.Errorf("expected ipv4=%q to parse cleanly: %v", v, err)
}
}
}
func TestParseAnnotations_IPAlgo(t *testing.T) {
a, err := ParseAnnotations(map[string]string{
annotationPrefix + "ip-algo": "namespace,pod,image",
})
}, BuiltinFamilyDefaults())
if err != nil {
t.Fatal(err)
}
@@ -54,10 +171,18 @@ func TestParseAnnotations_IPAlgo(t *testing.T) {
}
}
func TestParseAnnotations_IPAlgo_Unknown(t *testing.T) {
if _, err := ParseAnnotations(map[string]string{
annotationPrefix + "ip-algo": "namespace,foo",
}, BuiltinFamilyDefaults()); err == nil {
t.Fatalf("expected unknown-field error")
}
}
func TestParseAnnotations_CIDR(t *testing.T) {
a, err := ParseAnnotations(map[string]string{
annotationPrefix + "cidr6": "2602:817:3000:f001::/64, 2602:817:3000:f002::/64",
})
}, BuiltinFamilyDefaults())
if err != nil {
t.Fatal(err)
}
@@ -66,9 +191,49 @@ func TestParseAnnotations_CIDR(t *testing.T) {
}
}
func TestParseAnnotations_CIDR_FamilyMismatch(t *testing.T) {
// v4 prefix in a cidr6 annotation must not silently slip through.
if _, err := ParseAnnotations(map[string]string{
annotationPrefix + "cidr6": "10.0.0.0/8",
}, BuiltinFamilyDefaults()); err == nil {
t.Fatalf("expected family mismatch error")
}
if _, err := ParseAnnotations(map[string]string{
annotationPrefix + "cidr4": "2602:817::/32",
}, BuiltinFamilyDefaults()); err == nil {
t.Fatalf("expected family mismatch error")
}
}
func TestParseAnnotations_Anycast_Mixed(t *testing.T) {
// Anycast accepts both families together — typical for a service that
// advertises one v6 and one v4 anycast IP.
a, err := ParseAnnotations(map[string]string{
annotationPrefix + "anycast": "2602:817:3000:ac::1, 172.25.255.1",
}, BuiltinFamilyDefaults())
if err != nil {
t.Fatal(err)
}
if len(a.Anycast) != 2 {
t.Fatalf("anycast len=%d", len(a.Anycast))
}
}
func TestParseCNIArgs(t *testing.T) {
args := ParseCNIArgs("IgnoreUnknown=1;K8S_POD_NAMESPACE=mail;K8S_POD_NAME=stalwart-0;K8S_POD_INFRA_CONTAINER_ID=abc123")
if args.PodNamespace != "mail" || args.PodName != "stalwart-0" || args.InfraID != "abc123" {
t.Fatalf("ParseCNIArgs got %+v", args)
}
}
func TestParseCNIArgs_EmptyAndMalformed(t *testing.T) {
// Permissive: malformed entries are skipped, never crash.
a := ParseCNIArgs("")
if a.PodName != "" {
t.Fatalf("empty input should yield empty CNIArgs, got %+v", a)
}
a = ParseCNIArgs(";;K8S_POD_NAMESPACE=ns;noequalshere;=novalue;K8S_POD_NAME=p")
if a.PodNamespace != "ns" || a.PodName != "p" {
t.Fatalf("permissive parse failed: %+v", a)
}
}
+22
View File
@@ -0,0 +1,22 @@
// Package agent owns the in-process flock-agent runtime. The agent is a
// single Linux DaemonSet pod per node and holds:
//
// - the durable per-node allocation file at /var/lib/flock/allocations.json
// (see Store in state.go),
// - an in-memory IPAM seeded from NodeConfig CIDRs and reconciled against
// the allocation file at startup (see ipam.go),
// - dynamic informers watching the per-node NodeConfig CR (nodeconfig.go)
// and the local-node Pod set (podinfo.go),
// - an RPC server speaking to the lightweight CNI plugin binary
// (cmd/flock and pkg/cni), so kubelet's CNI invocations are answered by
// a long-lived process rather than spinning up a fresh binary per ADD,
// - the BirdManager that renders bird.conf and triggers `birdc reload`
// on changes (bird.go), and
// - the AnycastReconciler that programs per-pod /128 and /32 host routes
// gated on Pod readiness (anycast_linux.go).
//
// The package is split between platform-specific files (anycast_linux.go,
// netns_linux.go, runtime_linux.go) and stub files used on non-Linux build
// hosts so the rest of the package — IPAM, parsing, store, RPC plumbing —
// stays unit-testable on macOS and Windows CI.
package agent
+2 -1
View File
@@ -49,7 +49,8 @@ func (h *PodHandler) Add(ctx context.Context, req flockcni.Request) (*current.Re
return nil, fmt.Errorf("lookup pod: %w", err)
}
parsed, err := ParseAnnotations(pod.Annotations)
defaults := FamilyDefaultsFromNodeConfig(h.NodeConfig.Load())
parsed, err := ParseAnnotations(pod.Annotations, defaults)
if err != nil {
return nil, fmt.Errorf("parse annotations: %w", err)
}
+63
View File
@@ -0,0 +1,63 @@
package agent
import (
"strings"
"testing"
)
func TestHostIfaceName_Format(t *testing.T) {
got := HostIfaceName("0123456789abcdef0123456789abcdef")
if !strings.HasPrefix(got, "flock") || len(got) != len("flock")+8 {
t.Fatalf("HostIfaceName=%q (want flock + 8 hex)", got)
}
}
func TestHostIfaceName_Determinism(t *testing.T) {
a := HostIfaceName("container-xyz")
b := HostIfaceName("container-xyz")
if a != b {
t.Fatalf("not deterministic: %s vs %s", a, b)
}
}
func TestHostIfaceName_DifferentInputs(t *testing.T) {
a := HostIfaceName("a")
b := HostIfaceName("b")
if a == b {
t.Fatalf("collision on trivial inputs")
}
}
// FuzzHostIfaceName ensures the host interface name generator never produces
// an output longer than IFNAMSIZ-1 (15 chars on Linux) and never panics.
// The name format is "flock" + 8 hex chars = 13 chars, always.
func FuzzHostIfaceName(f *testing.F) {
f.Add("")
f.Add("a")
f.Add("/var/run/netns/abc")
f.Add("0123456789abcdef0123456789abcdef")
f.Add(longString("x", 64*1024)) // very long containerID
f.Add("\x00\x00\x00")
f.Add("ünïcødé/контейнер")
f.Fuzz(func(t *testing.T, id string) {
got := HostIfaceName(id)
// Linux IFNAMSIZ is 16 (15 chars + NUL); ours must fit comfortably.
if len(got) > 15 {
t.Fatalf("HostIfaceName(%q)=%q exceeds 15 chars", id, got)
}
if !strings.HasPrefix(got, "flock") {
t.Fatalf("HostIfaceName(%q)=%q missing prefix", id, got)
}
// Suffix must be lowercase hex (8 chars).
suffix := got[len("flock"):]
if len(suffix) != 8 {
t.Fatalf("HostIfaceName(%q) suffix len=%d", id, len(suffix))
}
for _, c := range suffix {
if !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f')) {
t.Fatalf("HostIfaceName(%q)=%q has non-hex suffix", id, got)
}
}
})
}
+41 -18
View File
@@ -62,13 +62,15 @@ func (cryptoRand) PickIndex(n int) int {
}
// AllocRequest describes a pending allocation. Values come from Pod metadata
// + annotations at CNI ADD time.
// + annotations at CNI ADD time, with per-node FamilyDefaults already merged
// in (see ParseAnnotations).
type AllocRequest struct {
ContainerID string
Namespace string
Pod string
// WantV6 / WantV4 come from the ipv6 / ipv4 annotations (defaults in
// design doc: ipv6=true, ipv4=false).
// WantV6 / WantV4 are the post-merge address family selection (pod
// annotation > NodeConfig.Spec.Defaults > built-in baseline). At least
// one MUST be true; Allocate rejects the request otherwise.
WantV6 bool
WantV4 bool
// AnnCIDR6 / AnnCIDR4 come from the cidr6 / cidr4 annotations. Empty
@@ -224,34 +226,36 @@ func (i *IPAM) allocV6(cidr *net.IPNet, req AllocRequest) (net.IP, error) {
// randomV6 picks a random /128 inside cidr. The network prefix bits are
// preserved from cidr.IP; the host bits are filled from the random source.
//
// Implementation: walk the 16 IPv6 bytes once. For each byte we ask whether
// it's entirely inside the network mask (skip), entirely inside the host
// portion (overwrite with random), or split (combine bits from both).
func (i *IPAM) randomV6(cidr *net.IPNet) (net.IP, error) {
ones, bits := cidr.Mask.Size()
if bits != 128 {
return nil, fmt.Errorf("cidr %s is not IPv6", cidr)
}
out := make(net.IP, 16)
out := make(net.IP, net.IPv6len)
copy(out, cidr.IP.To16())
hostBits := 128 - ones
rnd := make([]byte, 16)
rnd := make([]byte, net.IPv6len)
i.randSrc.FillIID(rnd)
// Merge rnd into out where mask bit is 0.
for b := 0; b < 16; b++ {
// Host bits start at bit index `ones`, byte `b`.
for b := 0; b < net.IPv6len; b++ {
byteStart := b * 8
byteEnd := byteStart + 8
if byteEnd <= ones {
continue // entirely network
}
if byteStart >= ones {
out[b] = rnd[b] // entirely host
switch {
case byteEnd <= ones:
// Entirely inside the network prefix — leave untouched.
continue
}
// Split byte: top (ones-byteStart) bits are network, rest is host.
case byteStart >= ones:
// Entirely inside the host portion — fully randomise.
out[b] = rnd[b]
default:
// Split byte: top (ones-byteStart) bits are network, rest host.
networkBits := ones - byteStart
hostMask := byte(0xFF) >> uint(networkBits)
out[b] = (out[b] & ^hostMask) | (rnd[b] & hostMask)
}
_ = hostBits
}
return out, nil
}
@@ -360,15 +364,34 @@ func toStringSlice(ns []*net.IPNet) []string {
return out
}
// canonical returns the textual form of ip in its native family, so the same
// host address is always represented identically regardless of whether it
// arrived as a 4-byte slice, a 16-byte v4-in-v6 slice, or a string-parsed
// net.IP. Used as the key for the in-use map.
//
// Returns "" for nil input — callers MUST treat the returned key as opaque
// and never use the empty string as a sentinel.
func canonical(ip net.IP) string {
if ip == nil {
return ""
}
if v4 := ip.To4(); v4 != nil {
return v4.String()
}
return ip.To16().String()
if v16 := ip.To16(); v16 != nil {
return v16.String()
}
return ""
}
// ipToU32 reads a 4-byte IPv4 net.IP into a uint32. The caller is expected
// to have already validated that ip is an IPv4 address; mis-use returns 0
// rather than panicking.
func ipToU32(ip net.IP) uint32 {
v4 := ip.To4()
if v4 == nil {
return 0
}
return uint32(v4[0])<<24 | uint32(v4[1])<<16 | uint32(v4[2])<<8 | uint32(v4[3])
}
+169
View File
@@ -0,0 +1,169 @@
package agent
import (
"net"
"testing"
)
// FuzzIPAM_Allocate runs randomly-driven Allocate/Release sequences against
// a /120 IPv6 + /28 IPv4 IPAM so the fuzzer can hit address exhaustion.
//
// Properties checked:
//
// 1. Allocate never panics regardless of the action stream.
// 2. The set of in-use addresses never contains an address that has been
// released without a subsequent successful Allocate.
// 3. A successful v6 allocation always yields an address inside the
// configured /120, and a successful v4 always inside the configured /28.
// 4. ipToU32(canonical(allocated v4)) round-trips, and likewise that no
// v4 allocation lands on .0 (network) or .15 (broadcast) of the /28.
//
// The fuzzed bytes are interpreted as an opcode stream:
// - bytes[i] & 0x03 selects the action: 0=alloc-v6, 1=alloc-v4,
// 2=alloc-dual, 3=release-most-recent.
// - bytes[i]>>2 is fed into the deterministic random source so different
// fuzzed bytes drive different IID/index choices.
func FuzzIPAM_Allocate(f *testing.F) {
f.Add([]byte{0, 0, 0, 0})
f.Add([]byte{1, 1, 1, 1})
f.Add([]byte{2, 2, 2, 2})
f.Add([]byte{0, 1, 2, 3})
f.Add([]byte(longString("\x00\x01\x02\x03", 256)))
f.Fuzz(func(t *testing.T, ops []byte) {
ipam, err := NewIPAM(
[]string{"2001:db8::/120"}, // 256 host slots; 16 bytes of fuzzed nibbles
[]string{"10.0.0.0/28"}, // 14 usable hosts (.2..14)
)
if err != nil {
t.Fatal(err)
}
// Deterministic source: replay nibbles cycled from `ops`.
fr := &fakeRand{
nibbles: append([]byte{}, ops...),
iids: [][]byte{
// 16 bytes of "host portion" — only the last byte matters
// for a /120 prefix.
makeIID(ops, 0),
makeIID(ops, 1),
makeIID(ops, 2),
makeIID(ops, 3),
},
}
if len(fr.nibbles) == 0 {
fr.nibbles = []byte{0}
}
ipam.randSrc = fr
net6 := mustNet(t, "2001:db8::/120")
net4 := mustNet(t, "10.0.0.0/28")
var live []AllocResult
seen := map[string]struct{}{}
for idx, op := range ops {
req := AllocRequest{ContainerID: idStr(idx)}
switch op & 0x03 {
case 0:
req.WantV6 = true
case 1:
req.WantV4 = true
case 2:
req.WantV6, req.WantV4 = true, true
case 3:
if len(live) == 0 {
continue
}
rel := live[len(live)-1]
live = live[:len(live)-1]
ipam.Release(rel.IP6, rel.IP4)
delete(seen, canonical(rel.IP6))
delete(seen, canonical(rel.IP4))
continue
}
res, err := ipam.Allocate(req)
if err != nil {
continue // exhaustion is acceptable
}
if req.WantV6 {
if res.IP6 == nil {
t.Fatalf("requested v6 but got nil")
}
if !net6.Contains(res.IP6) {
t.Fatalf("v6 %s outside /120", res.IP6)
}
if _, dup := seen[canonical(res.IP6)]; dup {
t.Fatalf("v6 %s duplicated", res.IP6)
}
seen[canonical(res.IP6)] = struct{}{}
}
if req.WantV4 {
if res.IP4 == nil {
t.Fatalf("requested v4 but got nil")
}
if !net4.Contains(res.IP4) {
t.Fatalf("v4 %s outside /28", res.IP4)
}
v4 := res.IP4.To4()
if v4 == nil {
t.Fatalf("v4 result not 4-byte: %s", res.IP4)
}
// Skip .0 (network) and .15 (broadcast). The allocator
// should also skip .1 (gateway) by convention.
last := v4[3]
if last == 0 || last == 1 || last == 15 {
t.Fatalf("v4 %s in reserved range", res.IP4)
}
if _, dup := seen[canonical(res.IP4)]; dup {
t.Fatalf("v4 %s duplicated", res.IP4)
}
seen[canonical(res.IP4)] = struct{}{}
}
live = append(live, res)
}
})
}
// FuzzCanonical asserts that canonical never panics and is idempotent.
func FuzzCanonical(f *testing.F) {
f.Add([]byte{})
f.Add([]byte{1, 2, 3, 4})
f.Add([]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0})
f.Add([]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0xff, 10, 0, 0, 1}) // v4-mapped v6
f.Add([]byte{0xff})
f.Fuzz(func(t *testing.T, b []byte) {
ip := net.IP(b)
s1 := canonical(ip)
// Idempotent: re-canonicalising the parsed form yields the same
// string for any non-empty result.
if s1 != "" {
parsed := net.ParseIP(s1)
if parsed == nil {
t.Fatalf("canonical(%v)=%q is not parseable as IP", b, s1)
}
if got := canonical(parsed); got != s1 {
t.Fatalf("not idempotent: %q -> %q", s1, got)
}
}
})
}
func makeIID(seed []byte, salt byte) []byte {
out := make([]byte, net.IPv6len)
for i := range out {
if i < len(seed) {
out[i] = seed[i] ^ salt
} else {
out[i] = salt
}
}
return out
}
func idStr(i int) string {
const hex = "0123456789abcdef"
return string([]byte{'c', '-', hex[(i>>4)&0xF], hex[i&0xF]})
}
+85
View File
@@ -0,0 +1,85 @@
//go:build linux
package netpol
import (
"bytes"
"context"
"fmt"
"os/exec"
"time"
)
// Applier hands rendered nft scripts to the kernel via `nft -f -`.
// nftables guarantees the entire script applies atomically — if any line
// is rejected, the previous ruleset stays intact.
//
// Applier maintains the last-applied script string and skips the exec
// when the new render is byte-identical, so a 5s reconcile tick on a
// quiet cluster is cheap.
type Applier struct {
// NftPath is the path to the nft binary. Empty means "look up `nft`
// on PATH". Tests set this to a fake.
NftPath string
// Timeout bounds an individual nft invocation; if zero, defaults to
// 5 seconds.
Timeout time.Duration
last string
}
// Apply runs `nft -f -` with the supplied script. Idempotent: if script
// equals the last successful application, this is a no-op.
//
// Returns an error from nft (with stderr captured) if the script is
// malformed or the kernel rejects it.
func (a *Applier) Apply(ctx context.Context, script string) error {
if script == a.last {
return nil
}
timeout := a.Timeout
if timeout == 0 {
timeout = 5 * time.Second
}
bin := a.NftPath
if bin == "" {
bin = "nft"
}
cctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
cmd := exec.CommandContext(cctx, bin, "-f", "-")
cmd.Stdin = bytes.NewBufferString(script)
var stderr bytes.Buffer
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
return fmt.Errorf("nft -f -: %w: %s", err, stderr.String())
}
a.last = script
return nil
}
// Clear tears down the flock NetworkPolicy table — used by graceful
// shutdown so a stopping agent doesn't leave stale enforcement behind.
// Best-effort: if nft is missing or the table doesn't exist, returns
// nil.
func (a *Applier) Clear(ctx context.Context) error {
timeout := a.Timeout
if timeout == 0 {
timeout = 5 * time.Second
}
bin := a.NftPath
if bin == "" {
bin = "nft"
}
cctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
cmd := exec.CommandContext(cctx, bin, "destroy", "table", "inet", "flock_netpol")
if err := cmd.Run(); err != nil {
// nft returns non-zero if the table doesn't exist — that's a
// success for our purposes.
return nil
}
a.last = ""
return nil
}
+16
View File
@@ -0,0 +1,16 @@
//go:build !linux
package netpol
import "context"
// Applier is a no-op on non-Linux build hosts so unit tests run on macOS
// without nft.
type Applier struct {
NftPath string
Timeout interface{}
last string
}
func (a *Applier) Apply(_ context.Context, script string) error { a.last = script; return nil }
func (a *Applier) Clear(_ context.Context) error { a.last = ""; return nil }
+250
View File
@@ -0,0 +1,250 @@
package netpol
import (
"net"
"strings"
"testing"
corev1 "k8s.io/api/core/v1"
netv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
)
// These fixtures mirror the three NetworkPolicies live in the sjc001
// cluster on 2026-04-25. They serve as integration-shaped tests: the
// translator + renderer must produce a sensible nft script for each.
//
// Source of truth (refresh by running `kubectl get netpol -A -o yaml`):
//
// - calico-apiserver/allow-apiserver
// - remote-proxies/lodge-home-assistant-ingress
// - storage/garage-admin-restrict
// allowApiserverPolicy: TCP/5443 ingress to apiserver=true pods, no peer
// restriction (allow-from-anywhere on that port).
func allowApiserverPolicy() netv1.NetworkPolicy {
tcp := corev1.ProtocolTCP
port := intstr.FromInt32(5443)
return netv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{Namespace: "calico-apiserver", Name: "allow-apiserver"},
Spec: netv1.NetworkPolicySpec{
PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"apiserver": "true"}},
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
Ingress: []netv1.NetworkPolicyIngressRule{{
Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &port}},
}},
},
}
}
// lodgeHomeAssistantPolicy: TCP/8080 from any pod in the `edge` namespace
// to pods labelled app=lodge-home-assistant.
func lodgeHomeAssistantPolicy() netv1.NetworkPolicy {
tcp := corev1.ProtocolTCP
port := intstr.FromInt32(8080)
return netv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{Namespace: "remote-proxies", Name: "lodge-home-assistant-ingress"},
Spec: netv1.NetworkPolicySpec{
PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"app": "lodge-home-assistant"}},
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
Ingress: []netv1.NetworkPolicyIngressRule{{
From: []netv1.NetworkPolicyPeer{{
NamespaceSelector: &metav1.LabelSelector{
MatchLabels: map[string]string{"kubernetes.io/metadata.name": "edge"},
},
}},
Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &port}},
}},
},
}
}
// garageAdminPolicy: complex two-rule policy.
//
// 1. Allow TCP/{3900, 80, 3901} from anywhere.
// 2. Allow TCP/3903 only from pods in `edge` or `storage`.
func garageAdminPolicy() netv1.NetworkPolicy {
tcp := corev1.ProtocolTCP
p3900 := intstr.FromInt32(3900)
p80 := intstr.FromInt32(80)
p3901 := intstr.FromInt32(3901)
p3903 := intstr.FromInt32(3903)
return netv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{Namespace: "storage", Name: "garage-admin-restrict"},
Spec: netv1.NetworkPolicySpec{
PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"app": "garage"}},
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
Ingress: []netv1.NetworkPolicyIngressRule{
{
Ports: []netv1.NetworkPolicyPort{
{Protocol: &tcp, Port: &p3900},
{Protocol: &tcp, Port: &p80},
{Protocol: &tcp, Port: &p3901},
},
},
{
From: []netv1.NetworkPolicyPeer{
{NamespaceSelector: &metav1.LabelSelector{
MatchLabels: map[string]string{"kubernetes.io/metadata.name": "edge"},
}},
{NamespaceSelector: &metav1.LabelSelector{
MatchLabels: map[string]string{"kubernetes.io/metadata.name": "storage"},
}},
},
Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &p3903}},
},
},
},
}
}
// TestClusterFixture_AllowApiserver — pod selected by the policy gets
// isolated; the rendered script accepts TCP/5443 from anywhere.
func TestClusterFixture_AllowApiserver(t *testing.T) {
pod := Pod{
Namespace: "calico-apiserver",
Name: "calico-apiserver-1",
Labels: map[string]string{"apiserver": "true"},
HostIface: "flock00000001",
IPs: []net.IP{mustIP("2001:db8::1")},
}
out, err := Translate(Inputs{
LocalPods: []Pod{pod},
Policies: []netv1.NetworkPolicy{allowApiserverPolicy()},
}, nil)
if err != nil {
t.Fatal(err)
}
in, _ := isolationFor(out, "calico-apiserver/calico-apiserver-1")
if !in {
t.Fatalf("apiserver pod should be isolated for ingress")
}
script := Render(out)
if !strings.Contains(script, "tcp dport 5443 accept") {
t.Fatalf("expected TCP/5443 allow:\n%s", script)
}
// No peer filter — allow-all-on-port.
if strings.Contains(script, "ip6 saddr {") || strings.Contains(script, "ip saddr {") {
t.Fatalf("expected no peer filter for allow-from-anywhere:\n%s", script)
}
}
// TestClusterFixture_LodgeHomeAssistant — pod isolated; only TCP/8080
// from edge namespace is allowed.
func TestClusterFixture_LodgeHomeAssistant(t *testing.T) {
pod := Pod{
Namespace: "remote-proxies",
Name: "lodge-home-assistant-0",
Labels: map[string]string{"app": "lodge-home-assistant"},
HostIface: "flock00000002",
IPs: []net.IP{mustIP("2001:db8::2")},
}
traefik := PeerPod{
Namespace: "edge", Name: "traefik-0",
Labels: map[string]string{"app": "traefik"},
IPs: []net.IP{mustIP("2001:db8::aa")},
}
stranger := PeerPod{
Namespace: "default", Name: "random",
Labels: map[string]string{"app": "random"},
IPs: []net.IP{mustIP("2001:db8::bb")},
}
out, err := Translate(Inputs{
LocalPods: []Pod{pod},
PeerPods: []PeerPod{traefik, stranger},
Namespaces: []Namespace{
{Name: "edge", Labels: map[string]string{"kubernetes.io/metadata.name": "edge"}},
{Name: "default", Labels: map[string]string{"kubernetes.io/metadata.name": "default"}},
{Name: "remote-proxies", Labels: map[string]string{"kubernetes.io/metadata.name": "remote-proxies"}},
},
Policies: []netv1.NetworkPolicy{lodgeHomeAssistantPolicy()},
}, nil)
if err != nil {
t.Fatal(err)
}
if len(out.Rules) != 1 {
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
}
r := out.Rules[0]
// Peer should be exactly traefik's IP, not stranger's.
got := map[string]bool{}
for _, c := range r.PeerCIDRs {
got[c.IP.String()] = true
}
if !got["2001:db8::aa"] {
t.Fatalf("traefik IP missing from rule: %v", got)
}
if got["2001:db8::bb"] {
t.Fatalf("stranger IP leaked into rule")
}
script := Render(out)
if !strings.Contains(script, "tcp dport 8080 accept") {
t.Fatalf("expected TCP/8080 allow:\n%s", script)
}
}
// TestClusterFixture_Garage — verifies the two-rule policy:
//
// 1. ports {3900, 80, 3901} accept from any peer
// 2. port 3903 accept only from edge or storage namespaces
func TestClusterFixture_Garage(t *testing.T) {
pod := Pod{
Namespace: "storage", Name: "garage-0",
Labels: map[string]string{"app": "garage"},
HostIface: "flock00000003",
IPs: []net.IP{mustIP("2001:db8::3")},
}
storagePeer := PeerPod{
Namespace: "storage", Name: "garage-1",
Labels: map[string]string{"app": "garage"},
IPs: []net.IP{mustIP("2001:db8::31")},
}
edgePeer := PeerPod{
Namespace: "edge", Name: "traefik-0",
Labels: map[string]string{"app": "traefik"},
IPs: []net.IP{mustIP("2001:db8::41")},
}
stranger := PeerPod{
Namespace: "default", Name: "random",
Labels: map[string]string{"app": "random"},
IPs: []net.IP{mustIP("2001:db8::ff")},
}
out, err := Translate(Inputs{
LocalPods: []Pod{pod},
PeerPods: []PeerPod{storagePeer, edgePeer, stranger},
Namespaces: []Namespace{
{Name: "edge", Labels: map[string]string{"kubernetes.io/metadata.name": "edge"}},
{Name: "storage", Labels: map[string]string{"kubernetes.io/metadata.name": "storage"}},
{Name: "default", Labels: map[string]string{"kubernetes.io/metadata.name": "default"}},
},
Policies: []netv1.NetworkPolicy{garageAdminPolicy()},
}, nil)
if err != nil {
t.Fatal(err)
}
// Two ingress rules in the source policy → two Rules out (one per
// peer set, ports inline).
if len(out.Rules) != 2 {
t.Fatalf("expected 2 rules (one per ingress entry), got %d", len(out.Rules))
}
script := Render(out)
for _, want := range []string{
"tcp dport 3900 accept",
"tcp dport 80 accept",
"tcp dport 3901 accept",
"tcp dport 3903 accept",
} {
if !strings.Contains(script, want) {
t.Errorf("missing %q in script:\n%s", want, script)
}
}
// The 3903 rule must carry a peer filter for both edge and storage
// peer IPs but not the stranger.
if !strings.Contains(script, "2001:db8::31/128") || !strings.Contains(script, "2001:db8::41/128") {
t.Fatalf("expected edge+storage peer IPs in 3903 rule:\n%s", script)
}
if strings.Contains(script, "2001:db8::ff/128") {
t.Fatalf("stranger IP must not appear:\n%s", script)
}
}
+44
View File
@@ -0,0 +1,44 @@
// Package netpol implements Kubernetes NetworkPolicy enforcement for flock.
//
// # Model
//
// NetworkPolicy is a Kubernetes-native API (`networking.k8s.io/v1`) that
// describes which pods may receive traffic (Ingress) and / or initiate
// traffic (Egress). The semantics are isolation by selection: a pod that is
// selected by *any* NetworkPolicy in a given direction becomes default-deny
// in that direction, plus the union of all "allow" rules from every policy
// that selects it. A pod selected by no policy is unrestricted.
//
// flock enforces these semantics with nftables. Each agent is responsible
// for the pods scheduled on its own node — peer addresses (from
// podSelector / namespaceSelector / ipBlock peers) come from a cluster-wide
// informer set so the agent can resolve peers that live elsewhere.
//
// # Pipeline
//
// The work is split into four stages with hard boundaries between them so
// each can be tested in isolation:
//
// 1. Informers (informers.go) — watch NetworkPolicies, Namespaces, and
// all Pods in the cluster. Maintain indices the translator can query.
//
// 2. Translator (translator.go) — pure function from
// (NetworkPolicy set, Namespace set, Pod set, local-node pod set) to
// []Rule. No I/O, no hidden state — straightforward to fuzz and unit
// test. Implements the default-deny semantics and the peer-resolution
// rules from the NetworkPolicy spec.
//
// 3. Renderer (render.go) — pure function from []Rule to an nft script
// (string). Output is deterministic so the apply stage can de-dupe.
//
// 4. Apply (apply_linux.go) — shell out to `nft -f -` for an atomic
// reconfiguration. nftables guarantees the whole script applies as a
// single transaction; partial failures roll back automatically.
//
// # Why nftables (and not eBPF)
//
// Atomic ruleset transactions, kernel-native, no userspace ebpf-loader to
// maintain, and behaviour an operator can read directly with
// `nft list ruleset`. The cost is that we walk per-pod chains in software,
// which is fine at the cluster sizes flock targets.
package netpol
+222
View File
@@ -0,0 +1,222 @@
package netpol
import (
"context"
"fmt"
"log/slog"
"net"
"sync"
"time"
corev1 "k8s.io/api/core/v1"
netv1 "k8s.io/api/networking/v1"
"k8s.io/client-go/informers"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/cache"
)
// World aggregates the cluster-wide caches the reconciler queries on
// every pass: NetworkPolicies, Namespaces, and all Pods (for peer
// resolution). Each field is safe for concurrent reads.
type World struct {
logger *slog.Logger
mu sync.RWMutex
policies map[string]netv1.NetworkPolicy // key = ns/name
namespaces map[string]Namespace
peerPods map[string]PeerPod // key = ns/name
onChange []func()
}
// NewWorld returns an empty World. Callers should call Start to populate
// it; before Start, the snapshot accessors return empty slices.
func NewWorld(logger *slog.Logger) *World {
return &World{
logger: logger,
policies: map[string]netv1.NetworkPolicy{},
namespaces: map[string]Namespace{},
peerPods: map[string]PeerPod{},
}
}
// OnChange registers a callback fired (synchronously, inside the informer
// event handler) whenever any watched object changes. The reconciler
// uses this to debounce policy reloads.
func (w *World) OnChange(f func()) {
w.mu.Lock()
defer w.mu.Unlock()
w.onChange = append(w.onChange, f)
}
func (w *World) fireChange() {
w.mu.RLock()
cbs := append([]func(){}, w.onChange...)
w.mu.RUnlock()
for _, f := range cbs {
f()
}
}
// Start launches three informers (NetworkPolicy, Namespace, Pod) against
// the cluster API. It blocks until each cache reports synced. The caller
// is responsible for cancelling ctx on shutdown.
func (w *World) Start(ctx context.Context, cfg *rest.Config) error {
cs, err := kubernetes.NewForConfig(cfg)
if err != nil {
return fmt.Errorf("kubernetes client: %w", err)
}
factory := informers.NewSharedInformerFactory(cs, 10*time.Minute)
npInformer := factory.Networking().V1().NetworkPolicies().Informer()
nsInformer := factory.Core().V1().Namespaces().Informer()
podInformer := factory.Core().V1().Pods().Informer()
if _, err := npInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) { w.onPolicy(obj, false) },
UpdateFunc: func(_, n interface{}) { w.onPolicy(n, false) },
DeleteFunc: func(obj interface{}) { w.onPolicy(obj, true) },
}); err != nil {
return fmt.Errorf("add netpol handler: %w", err)
}
if _, err := nsInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) { w.onNamespace(obj, false) },
UpdateFunc: func(_, n interface{}) { w.onNamespace(n, false) },
DeleteFunc: func(obj interface{}) { w.onNamespace(obj, true) },
}); err != nil {
return fmt.Errorf("add ns handler: %w", err)
}
if _, err := podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) { w.onPod(obj, false) },
UpdateFunc: func(_, n interface{}) { w.onPod(n, false) },
DeleteFunc: func(obj interface{}) { w.onPod(obj, true) },
}); err != nil {
return fmt.Errorf("add pod handler: %w", err)
}
w.logger.Info("netpol informers starting")
factory.Start(ctx.Done())
if !cache.WaitForCacheSync(ctx.Done(),
npInformer.HasSynced, nsInformer.HasSynced, podInformer.HasSynced) {
return fmt.Errorf("netpol informer caches failed to sync")
}
w.logger.Info("netpol informers synced",
"netpols", len(w.snapshotPolicies()),
"namespaces", len(w.snapshotNamespaces()),
"peer_pods", len(w.snapshotPeerPods()))
return nil
}
// unwrapDFSU lifts a DeletedFinalStateUnknown wrapper if present.
func unwrapDFSU(obj interface{}) interface{} {
if d, ok := obj.(cache.DeletedFinalStateUnknown); ok {
return d.Obj
}
return obj
}
func (w *World) onPolicy(obj interface{}, deleted bool) {
p, ok := unwrapDFSU(obj).(*netv1.NetworkPolicy)
if !ok || p == nil {
return
}
key := p.Namespace + "/" + p.Name
w.mu.Lock()
if deleted {
delete(w.policies, key)
} else {
w.policies[key] = *p
}
w.mu.Unlock()
w.fireChange()
}
func (w *World) onNamespace(obj interface{}, deleted bool) {
ns, ok := unwrapDFSU(obj).(*corev1.Namespace)
if !ok || ns == nil {
return
}
w.mu.Lock()
if deleted {
delete(w.namespaces, ns.Name)
} else {
w.namespaces[ns.Name] = Namespace{Name: ns.Name, Labels: ns.Labels}
}
w.mu.Unlock()
w.fireChange()
}
func (w *World) onPod(obj interface{}, deleted bool) {
pod, ok := unwrapDFSU(obj).(*corev1.Pod)
if !ok || pod == nil {
return
}
key := pod.Namespace + "/" + pod.Name
w.mu.Lock()
if deleted {
delete(w.peerPods, key)
} else {
w.peerPods[key] = PeerPod{
Namespace: pod.Namespace,
Name: pod.Name,
Labels: pod.Labels,
IPs: podIPs(pod),
}
}
w.mu.Unlock()
w.fireChange()
}
// podIPs extracts every PodIP from the status. Pods without status (still
// scheduling) yield nil — safe for the translator.
func podIPs(p *corev1.Pod) []net.IP {
out := make([]net.IP, 0, len(p.Status.PodIPs))
for _, addr := range p.Status.PodIPs {
ip := net.ParseIP(addr.IP)
if ip == nil {
continue
}
out = append(out, ip)
}
if len(out) == 0 && p.Status.PodIP != "" {
// Older clusters may populate PodIP but not PodIPs; tolerate both.
if ip := net.ParseIP(p.Status.PodIP); ip != nil {
out = append(out, ip)
}
}
return out
}
// snapshotPolicies returns a defensive copy of the policy map's values.
func (w *World) snapshotPolicies() []netv1.NetworkPolicy {
w.mu.RLock()
defer w.mu.RUnlock()
out := make([]netv1.NetworkPolicy, 0, len(w.policies))
for _, p := range w.policies {
out = append(out, p)
}
return out
}
// snapshotNamespaces returns a defensive copy of the namespace map.
func (w *World) snapshotNamespaces() []Namespace {
w.mu.RLock()
defer w.mu.RUnlock()
out := make([]Namespace, 0, len(w.namespaces))
for _, n := range w.namespaces {
out = append(out, n)
}
return out
}
// snapshotPeerPods returns a defensive copy of the peer-pod map.
func (w *World) snapshotPeerPods() []PeerPod {
w.mu.RLock()
defer w.mu.RUnlock()
out := make([]PeerPod, 0, len(w.peerPods))
for _, p := range w.peerPods {
out = append(out, p)
}
return out
}
+115
View File
@@ -0,0 +1,115 @@
package netpol
import (
"context"
"log/slog"
"sync"
"time"
)
// LocalPodSource produces the set of local pods (with their HostIface and
// IPs) the reconciler should enforce policy for. The agent's allocation
// store + pod informer is the natural implementer.
//
// The function is called inside the reconciler under no lock, so it must
// be safe for concurrent invocation.
type LocalPodSource func() []Pod
// Reconciler turns the World cache + LocalPodSource into nft rule
// applications. One reconcile pass:
//
// pods + policies + namespaces → Translate → Render → Apply
//
// The pass runs on:
//
// - World.OnChange (any informer event), debounced through a single
// coalescing channel,
// - a periodic tick (default 30s) so we self-heal if the kernel
// ruleset diverges from desired (e.g. someone manually `nft flush`d),
// - and explicit Trigger() calls (the agent fires this from CNI ADD /
// DEL hooks so policy lands before pod traffic flows).
type Reconciler struct {
World *World
Local LocalPodSource
Applier *Applier
Logger *slog.Logger
Interval time.Duration
mu sync.Mutex
trigger chan struct{}
}
// NewReconciler returns a Reconciler ready to Run. Interval defaults to
// 30s if zero.
func NewReconciler(world *World, local LocalPodSource, applier *Applier, logger *slog.Logger) *Reconciler {
r := &Reconciler{
World: world,
Local: local,
Applier: applier,
Logger: logger,
Interval: 30 * time.Second,
trigger: make(chan struct{}, 1),
}
world.OnChange(r.Trigger)
return r
}
// Trigger requests one reconcile pass. Coalesces — if a pass is already
// pending, the call is a no-op.
func (r *Reconciler) Trigger() {
select {
case r.trigger <- struct{}{}:
default:
}
}
// Run blocks until ctx is cancelled. Reconciles on Trigger or every
// Interval; calls Applier.Clear on shutdown.
func (r *Reconciler) Run(ctx context.Context) {
t := time.NewTicker(r.Interval)
defer t.Stop()
r.reconcile(ctx) // initial pass
for {
select {
case <-ctx.Done():
// Best-effort: drop our table on graceful exit. If the agent
// crashed without doing this, the next agent's first apply
// will replace the stale table atomically anyway.
_ = r.Applier.Clear(context.Background())
return
case <-t.C:
r.reconcile(ctx)
case <-r.trigger:
r.reconcile(ctx)
}
}
}
func (r *Reconciler) reconcile(ctx context.Context) {
r.mu.Lock()
defer r.mu.Unlock()
in := Inputs{
LocalPods: r.Local(),
PeerPods: r.World.snapshotPeerPods(),
Namespaces: r.World.snapshotNamespaces(),
Policies: r.World.snapshotPolicies(),
}
out, err := Translate(in, func(s string) { r.Logger.Warn(s) })
if err != nil {
r.Logger.Warn("netpol translate failed", "err", err)
return
}
script := Render(out)
if err := r.Applier.Apply(ctx, script); err != nil {
r.Logger.Warn("netpol apply failed", "err", err)
return
}
if len(out.Isolated) > 0 {
r.Logger.Info("netpol applied",
"isolated_chains", len(out.Isolated),
"rules", len(out.Rules),
"local_pods", len(in.LocalPods),
"policies", len(in.Policies))
}
}
+160
View File
@@ -0,0 +1,160 @@
package netpol
import (
"context"
"io"
"log/slog"
"net"
"strings"
"sync"
"sync/atomic"
"testing"
corev1 "k8s.io/api/core/v1"
netv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// fakeApplier captures Apply calls for assertion. Drop-in for *Applier in
// tests because Reconciler depends only on the (Apply, Clear) pair.
type fakeApplier struct {
mu sync.Mutex
calls []string
last string
err error
}
func (f *fakeApplier) Apply(_ context.Context, script string) error {
f.mu.Lock()
defer f.mu.Unlock()
if f.err != nil {
return f.err
}
if script == f.last {
return nil // de-dup like the real Applier
}
f.last = script
f.calls = append(f.calls, script)
return nil
}
func (f *fakeApplier) Clear(_ context.Context) error { return nil }
func (f *fakeApplier) lastScript() string {
f.mu.Lock()
defer f.mu.Unlock()
return f.last
}
func (f *fakeApplier) callCount() int {
f.mu.Lock()
defer f.mu.Unlock()
return len(f.calls)
}
// applierIface is satisfied by *Applier and *fakeApplier; we narrow
// Reconciler to this in tests by adapting via a tiny wrapper.
type applierIface interface {
Apply(context.Context, string) error
Clear(context.Context) error
}
// reconcileOnce drives one pass synchronously without spinning a goroutine.
func reconcileOnce(t *testing.T, world *World, local LocalPodSource, app applierIface) {
t.Helper()
in := Inputs{
LocalPods: local(),
PeerPods: world.snapshotPeerPods(),
Namespaces: world.snapshotNamespaces(),
Policies: world.snapshotPolicies(),
}
out, err := Translate(in, nil)
if err != nil {
t.Fatal(err)
}
if err := app.Apply(context.Background(), Render(out)); err != nil {
t.Fatal(err)
}
}
// silentLogger returns a slog.Logger discarding everything — keeps test
// output tidy.
func silentLogger() *slog.Logger {
return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{}))
}
func TestReconciler_NoIsolatedPods_ShortScript(t *testing.T) {
world := NewWorld(silentLogger())
local := func() []Pod { return nil }
app := &fakeApplier{}
reconcileOnce(t, world, local, app)
got := app.lastScript()
if !strings.Contains(got, "table inet flock_netpol") {
t.Fatalf("missing table:\n%s", got)
}
// Without any isolated pods the base chain has policy accept and no
// jumps. That's the desired "open" state.
if strings.Contains(got, "jump pod_") {
t.Fatalf("unexpected jump in open state:\n%s", got)
}
}
func TestReconciler_PolicyIsolatesLocalPod(t *testing.T) {
world := NewWorld(silentLogger())
// Seed a default-deny policy in ns1.
world.onPolicy(&netv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{Namespace: "ns1", Name: "deny-all"},
Spec: netv1.NetworkPolicySpec{
PodSelector: metav1.LabelSelector{},
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
},
}, false)
local := func() []Pod {
return []Pod{{
Namespace: "ns1", Name: "web",
Labels: map[string]string{"app": "web"},
HostIface: "flock00000001",
IPs: []net.IP{mustIP("2001:db8::1")},
}}
}
app := &fakeApplier{}
reconcileOnce(t, world, local, app)
got := app.lastScript()
if !strings.Contains(got, "_ingress {") {
t.Fatalf("expected pod ingress chain:\n%s", got)
}
if !strings.Contains(got, "drop") {
t.Fatalf("expected default-deny drop:\n%s", got)
}
if !strings.Contains(got, `oifname "flock00000001"`) {
t.Fatalf("expected base-chain jump anchored on veth:\n%s", got)
}
}
func TestReconciler_DedupesIdenticalRender(t *testing.T) {
world := NewWorld(silentLogger())
local := func() []Pod {
return []Pod{{
Namespace: "ns1", Name: "web", HostIface: "f1",
IPs: []net.IP{mustIP("2001:db8::1")},
}}
}
app := &fakeApplier{}
reconcileOnce(t, world, local, app)
reconcileOnce(t, world, local, app)
reconcileOnce(t, world, local, app)
if got := app.callCount(); got != 1 {
t.Fatalf("expected 1 unique apply, got %d", got)
}
}
func TestReconciler_OnChangeFiresTrigger(t *testing.T) {
world := NewWorld(silentLogger())
var triggered atomic.Int32
world.OnChange(func() { triggered.Add(1) })
world.onNamespace(&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: "foo"}}, false)
world.onPolicy(&netv1.NetworkPolicy{ObjectMeta: metav1.ObjectMeta{Namespace: "foo", Name: "p"}}, false)
if triggered.Load() != 2 {
t.Fatalf("expected 2 OnChange calls, got %d", triggered.Load())
}
}
+322
View File
@@ -0,0 +1,322 @@
package netpol
import (
"fmt"
"hash/fnv"
"net"
"sort"
"strings"
)
// Render produces an nftables script that, when applied with `nft -f -`,
// installs the desired NetworkPolicy enforcement state for this node.
//
// Layout:
//
// table inet flock_netpol {
// chain forward { # base chain on hook forward
// type filter hook forward priority filter; policy accept;
// # one jump per (pod, direction) that has rules and/or isolation
// iifname "flock1a2b3c4d" ip6 saddr 2001:db8::1 jump pod_<hash>_egress
// oifname "flock1a2b3c4d" ip6 daddr 2001:db8::1 jump pod_<hash>_ingress
// }
// chain pod_<hash>_ingress { # one per isolated direction
// # explicit allow lines (empty for default-deny)
// drop
// }
// chain pod_<hash>_egress { ... }
// }
//
// The whole table is replaced atomically: a "delete table … 2>/dev/null"
// (best-effort) followed by an "add table" + the chains. nft executes the
// script as a single transaction; partial application is impossible.
//
// Output is deterministic: equal Output → byte-identical script. The
// reconciler relies on this for de-dup.
func Render(out Output) string {
var sb strings.Builder
sb.WriteString("# Generated by flock-agent netpol; do not edit by hand.\n")
// Best-effort delete; if the table doesn't exist (first run) nft
// returns an error, hence the redirect. The "add table" then
// recreates everything.
sb.WriteString("destroy table inet flock_netpol\n")
sb.WriteString("table inet flock_netpol {\n")
// Build per-(pod, direction) chains. We need them defined BEFORE the
// base chain references them, so we render chains first.
chains := buildChains(out)
for _, c := range chains {
writeChain(&sb, c)
}
// Base chain emits jumps in a stable order (chain name asc).
sb.WriteString("\tchain forward {\n")
sb.WriteString("\t\ttype filter hook forward priority filter; policy accept;\n")
for _, c := range chains {
writeBaseJump(&sb, c)
}
sb.WriteString("\t}\n")
sb.WriteString("}\n")
return sb.String()
}
// chain is one rendered chain — one direction of one pod.
type chain struct {
name string // pod_<hash>_ingress / _egress
hostIface string
podIPs []net.IP
direction Direction
rules []Rule
policy string // "drop" or "accept"
}
// buildChains groups rules by (PodKey, Direction) and adds default-deny
// chains for isolated directions that received no explicit rules.
func buildChains(out Output) []chain {
type key struct {
podKey string
dir Direction
}
byKey := map[key]*chain{}
// Seed isolated directions with empty chains so default-deny lands
// even when no explicit allow rule was emitted for them.
for iso := range out.Isolated {
byKey[key{podKey: iso.PodKey, dir: iso.Direction}] = &chain{
direction: iso.Direction,
policy: "drop",
}
}
// Append rules into their chain. Rule.PodIPs and HostIface are
// authoritative — every rule for a given pod carries the same values
// (translator invariant), so we copy from the first.
for _, r := range out.Rules {
k := key{podKey: r.PodKey, dir: r.Direction}
c := byKey[k]
if c == nil {
// Rule for a non-isolated direction shouldn't happen in
// practice (translator only emits rules for selected pods)
// but be tolerant — the chain just gets policy accept.
c = &chain{direction: r.Direction, policy: "accept"}
byKey[k] = c
}
c.rules = append(c.rules, r)
if c.hostIface == "" {
c.hostIface = r.HostIface
c.podIPs = append([]net.IP(nil), r.PodIPs...)
}
}
// If a chain was created from Isolated only (no rules), look up the
// pod's HostIface + IPs from Output.Pods. This is the path a
// default-deny policy takes — no allow rules, only isolation.
for k, c := range byKey {
if c.hostIface != "" {
continue
}
if lp, ok := out.Pods[k.podKey]; ok {
c.hostIface = lp.HostIface
c.podIPs = append([]net.IP(nil), lp.IPs...)
continue
}
// Last resort: lift from any rule sharing the PodKey. Should
// not normally happen — the translator populates Pods for every
// isolated pod — but defends against partially-populated Output
// values constructed by tests.
for _, r := range out.Rules {
if r.PodKey == k.podKey {
c.hostIface = r.HostIface
c.podIPs = append([]net.IP(nil), r.PodIPs...)
break
}
}
}
// Materialise chain names and emit in deterministic order.
var chains []chain
for k, c := range byKey {
if c.hostIface == "" {
continue // can't jump to it; skip
}
c.name = chainName(k.podKey, c.direction)
chains = append(chains, *c)
}
sort.Slice(chains, func(i, j int) bool { return chains[i].name < chains[j].name })
return chains
}
// chainName produces a stable, name-safe chain identifier. Pod keys can
// contain characters nft doesn't allow in identifiers, so we hash them.
// Direction keeps ingress and egress separate.
func chainName(podKey string, dir Direction) string {
h := fnv.New64a()
_, _ = h.Write([]byte(podKey))
return fmt.Sprintf("pod_%016x_%s", h.Sum64(), dir)
}
// writeChain emits the chain definition. Empty chains exist deliberately:
// the chain's drop policy IS the default-deny.
func writeChain(sb *strings.Builder, c chain) {
fmt.Fprintf(sb, "\tchain %s {\n", c.name)
for _, r := range c.rules {
writeAllowRule(sb, r)
}
if c.policy == "drop" {
sb.WriteString("\t\tdrop\n")
}
sb.WriteString("\t}\n")
}
// writeAllowRule emits one accept line:
//
// [ip|ip6 saddr {peers}] [ip|ip6 saddr != {except}] [proto dport {port|port-end}] accept
//
// The saddr / daddr field flips based on direction (ingress = from peer →
// match saddr; egress = to peer → match daddr).
func writeAllowRule(sb *strings.Builder, r Rule) {
v6Peers, v4Peers := splitFamily(r.PeerCIDRs)
v6Except, v4Except := splitFamily(r.PeerExcept)
v6Pod, v4Pod := splitIPFamily(r.PodIPs)
hasPeerFilter := len(r.PeerCIDRs) > 0
emit := func(family string, peers, except []*net.IPNet, podIP net.IP) {
if hasPeerFilter && len(peers) == 0 && len(except) == 0 {
// Peer filter exists but no entries of this family — rule
// must not match anything for this family.
return
}
if podIP == nil {
// Pod has no address of this family; nothing to guard.
return
}
for _, port := range r.Ports {
sb.WriteString("\t\t")
// Peer (saddr/daddr) match: address is "peer's address",
// which is saddr on ingress and daddr on egress.
peerField := peerAddrField(family, r.Direction)
if hasPeerFilter && len(peers) > 0 {
fmt.Fprintf(sb, "%s { %s } ", peerField, joinCIDRs(peers))
}
if hasPeerFilter && len(except) > 0 {
fmt.Fprintf(sb, "%s != { %s } ", peerField, joinCIDRs(except))
}
// Port match.
writePortMatch(sb, port)
fmt.Fprintf(sb, "%s\n", r.Action)
}
}
emit("ip6", v6Peers, v6Except, v6Pod)
emit("ip", v4Peers, v4Except, v4Pod)
}
// peerAddrField returns "ip6 saddr" / "ip saddr" / "ip6 daddr" / "ip daddr"
// depending on family + direction. Ingress matches the peer as the source;
// egress matches the peer as the destination.
func peerAddrField(family string, dir Direction) string {
switch {
case dir == DirIngress:
return family + " saddr"
default:
return family + " daddr"
}
}
// writePortMatch appends "tcp dport 80 " (single port) or
// "tcp dport 8000-8999 " (range), or nothing when port is "any".
func writePortMatch(sb *strings.Builder, p PortMatch) {
if p.Port == 0 && p.Protocol == "" {
return
}
proto := p.Protocol
if proto == "" {
proto = "tcp"
}
if p.Port == 0 {
// Protocol-only match. nft has `meta l4proto tcp`.
fmt.Fprintf(sb, "meta l4proto %s ", proto)
return
}
if p.EndPort > p.Port {
fmt.Fprintf(sb, "%s dport %d-%d ", proto, p.Port, p.EndPort)
return
}
fmt.Fprintf(sb, "%s dport %d ", proto, p.Port)
}
// writeBaseJump emits one line per (pod, direction) chain in the base
// `forward` chain. The match is anchored on the host-side veth name so
// the rule only fires for traffic that genuinely crosses this pod's veth.
//
// We additionally constrain on the pod's address (saddr for egress, daddr
// for ingress) so a packet that somehow hits the wrong veth — e.g. during
// a CNI ADD race — won't be policy-evaluated against the wrong pod.
func writeBaseJump(sb *strings.Builder, c chain) {
v6, v4 := splitIPFamily(c.podIPs)
emit := func(family string, ip net.IP) {
if ip == nil {
return
}
var iface, addrField, addrStr string
if c.direction == DirEgress {
iface = "iifname"
addrField = family + " saddr"
} else {
iface = "oifname"
addrField = family + " daddr"
}
if family == "ip" {
addrStr = ip.To4().String()
} else {
addrStr = ip.To16().String()
}
fmt.Fprintf(sb, "\t\t%s \"%s\" %s %s jump %s\n", iface, c.hostIface, addrField, addrStr, c.name)
}
emit("ip6", v6)
emit("ip", v4)
}
// splitFamily partitions CIDRs into (v6, v4) lists, preserving order
// within each family.
func splitFamily(cs []*net.IPNet) ([]*net.IPNet, []*net.IPNet) {
var v6, v4 []*net.IPNet
for _, c := range cs {
if c.IP.To4() != nil {
v4 = append(v4, c)
} else {
v6 = append(v6, c)
}
}
return v6, v4
}
// splitIPFamily picks one v6 and one v4 from a list of pod IPs (a pod has
// at most one of each in flock's model).
func splitIPFamily(ips []net.IP) (v6, v4 net.IP) {
for _, ip := range ips {
if ip == nil {
continue
}
if ip.To4() != nil {
if v4 == nil {
v4 = ip
}
} else {
if v6 == nil {
v6 = ip
}
}
}
return
}
func joinCIDRs(cs []*net.IPNet) string {
parts := make([]string, len(cs))
for i, c := range cs {
parts[i] = c.String()
}
sort.Strings(parts)
return strings.Join(parts, ", ")
}
+219
View File
@@ -0,0 +1,219 @@
package netpol
import (
"net"
"strings"
"testing"
)
// TestRender_DefaultDeny — an isolated direction with no rules renders
// to a chain whose last action is "drop".
func TestRender_DefaultDeny(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
},
Rules: []Rule{
// Need at least one rule to give the chain its HostIface +
// PodIPs. Use an empty rule that selects the same chain.
{PodKey: "ns/web", HostIface: "flock00000001", PodIPs: []net.IP{mustIP("2001:db8::1")},
Direction: DirIngress, Action: ActionAccept,
Ports: []PortMatch{{}}},
},
}
got := Render(out)
if !strings.Contains(got, "table inet flock_netpol") {
t.Fatalf("missing table:\n%s", got)
}
if !strings.Contains(got, "type filter hook forward") {
t.Fatalf("missing base chain:\n%s", got)
}
if !strings.Contains(got, "drop") {
t.Fatalf("expected default-deny drop in chain:\n%s", got)
}
// Pod chain name must be deterministic-looking (pod_<hex>_ingress).
if !strings.Contains(got, "_ingress {") {
t.Fatalf("missing pod ingress chain:\n%s", got)
}
// Base chain jump anchored on veth + pod IP.
if !strings.Contains(got, `oifname "flock00000001"`) {
t.Fatalf("missing veth match in base chain:\n%s", got)
}
if !strings.Contains(got, "ip6 daddr 2001:db8::1") {
t.Fatalf("missing pod IP match in base chain:\n%s", got)
}
}
// TestRender_DualStack — pod with both v6 + v4 IPs gets two base-chain
// jumps.
func TestRender_DualStack(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
},
Rules: []Rule{{
PodKey: "ns/web", HostIface: "f1",
PodIPs: []net.IP{mustIP("2001:db8::1"), mustIP("10.0.0.1")},
Direction: DirIngress, Action: ActionAccept,
Ports: []PortMatch{{Protocol: "tcp", Port: 80}},
}},
}
got := Render(out)
if !strings.Contains(got, "ip6 daddr 2001:db8::1") {
t.Fatalf("missing v6 jump:\n%s", got)
}
if !strings.Contains(got, "ip daddr 10.0.0.1") {
t.Fatalf("missing v4 jump:\n%s", got)
}
}
// TestRender_PortAndPeer — a Rule with peer + port emits a syntactically
// well-formed allow line.
func TestRender_PortAndPeer(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
},
Rules: []Rule{{
PodKey: "ns/web", HostIface: "f1",
PodIPs: []net.IP{mustIP("2001:db8::1")},
Direction: DirIngress, Action: ActionAccept,
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::a/128")},
Ports: []PortMatch{{Protocol: "tcp", Port: 80}},
}},
}
got := Render(out)
if !strings.Contains(got, "ip6 saddr { 2001:db8::a/128 } tcp dport 80 accept") {
t.Fatalf("expected ingress allow with v6 peer + tcp/80:\n%s", got)
}
}
// TestRender_PortRange — endPort renders as "8000-8999".
func TestRender_PortRange(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
},
Rules: []Rule{{
PodKey: "ns/web", HostIface: "f1",
PodIPs: []net.IP{mustIP("2001:db8::1")},
Direction: DirIngress, Action: ActionAccept,
PeerCIDRs: []*net.IPNet{mustNet("0.0.0.0/0"), mustNet("::/0")},
Ports: []PortMatch{{Protocol: "tcp", Port: 8000, EndPort: 8999}},
}},
}
got := Render(out)
if !strings.Contains(got, "tcp dport 8000-8999") {
t.Fatalf("expected port range:\n%s", got)
}
}
// TestRender_IPBlockExcept — except produces a "saddr != { … }" guard.
func TestRender_IPBlockExcept(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
},
Rules: []Rule{{
PodKey: "ns/web", HostIface: "f1",
PodIPs: []net.IP{mustIP("10.0.0.1")},
Direction: DirIngress, Action: ActionAccept,
PeerCIDRs: []*net.IPNet{mustNet("10.0.0.0/8")},
PeerExcept: []*net.IPNet{mustNet("10.99.0.0/16")},
Ports: []PortMatch{{}},
}},
}
got := Render(out)
if !strings.Contains(got, "ip saddr { 10.0.0.0/8 }") {
t.Fatalf("expected ipBlock cidr:\n%s", got)
}
if !strings.Contains(got, "ip saddr != { 10.99.0.0/16 }") {
t.Fatalf("expected ipBlock except:\n%s", got)
}
}
// TestRender_AllowAllPeers — empty PeerCIDRs/PeerExcept means "any peer";
// the rule should emit an unconditional accept (modulo port).
func TestRender_AllowAllPeers(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
},
Rules: []Rule{{
PodKey: "ns/web", HostIface: "f1",
PodIPs: []net.IP{mustIP("2001:db8::1")},
Direction: DirIngress, Action: ActionAccept,
Ports: []PortMatch{{Protocol: "tcp", Port: 443}},
}},
}
got := Render(out)
if !strings.Contains(got, "tcp dport 443 accept") {
t.Fatalf("expected unconditional tcp/443 allow:\n%s", got)
}
// Should NOT have a saddr/daddr filter (empty peers).
if strings.Contains(got, "ip6 saddr {") || strings.Contains(got, "ip saddr {") {
t.Fatalf("expected no peer filter:\n%s", got)
}
}
// TestRender_Determinism — same input → byte-identical output.
func TestRender_Determinism(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirIngress}: {},
{PodKey: "ns/db", Direction: DirEgress}: {},
},
Rules: []Rule{
{PodKey: "ns/web", HostIface: "f1", PodIPs: []net.IP{mustIP("2001:db8::1")},
Direction: DirIngress, Action: ActionAccept,
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::5/128"), mustNet("2001:db8::3/128")},
Ports: []PortMatch{{Protocol: "tcp", Port: 80}}},
{PodKey: "ns/db", HostIface: "f2", PodIPs: []net.IP{mustIP("2001:db8::2")},
Direction: DirEgress, Action: ActionAccept,
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::aa/128")},
Ports: []PortMatch{{}}},
},
}
a := Render(out)
b := Render(out)
if a != b {
t.Fatalf("Render not deterministic:\nA=\n%s\nB=\n%s", a, b)
}
// And peers in the rule must be sorted (we deliberately gave 5 then 3).
if strings.Index(a, "2001:db8::3/128") > strings.Index(a, "2001:db8::5/128") {
t.Fatalf("peer CIDRs not sorted within rule:\n%s", a)
}
}
// TestRender_EgressDirection — egress rules use iifname + saddr (pod-side).
func TestRender_EgressDirection(t *testing.T) {
out := Output{
Isolated: map[Isolation]struct{}{
{PodKey: "ns/web", Direction: DirEgress}: {},
},
Rules: []Rule{{
PodKey: "ns/web", HostIface: "f1",
PodIPs: []net.IP{mustIP("2001:db8::1")},
Direction: DirEgress, Action: ActionAccept,
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::aa/128")},
Ports: []PortMatch{{Protocol: "tcp", Port: 53}},
}},
}
got := Render(out)
// Base-chain jump for egress matches iifname + ip6 saddr (pod's IP).
if !strings.Contains(got, `iifname "f1" ip6 saddr 2001:db8::1`) {
t.Fatalf("missing egress base-chain jump:\n%s", got)
}
// Peer filter for egress matches the *destination* (the peer is downstream).
if !strings.Contains(got, "ip6 daddr { 2001:db8::aa/128 }") {
t.Fatalf("expected daddr peer filter for egress:\n%s", got)
}
}
func mustNet(s string) *net.IPNet {
_, n, err := net.ParseCIDR(s)
if err != nil {
panic(err)
}
return n
}
+443
View File
@@ -0,0 +1,443 @@
package netpol
import (
"fmt"
"net"
"sort"
netv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
)
// Inputs is the world-view the translator consumes. All fields are owned
// by the caller; the translator does not mutate them.
type Inputs struct {
// LocalPods are the pods scheduled on this node that have a committed
// flock allocation. Only these pods get rules — peers may live
// elsewhere.
LocalPods []Pod
// PeerPods is the cluster-wide pod set used to resolve podSelector +
// namespaceSelector peers. It is fine to include the local pods here
// too; duplicates are deduped by (namespace, name).
PeerPods []PeerPod
// Namespaces is the cluster's full Namespace set. Used for
// namespaceSelector matching.
Namespaces []Namespace
// Policies is every NetworkPolicy in the cluster. The translator
// filters down to those that select at least one local pod.
Policies []netv1.NetworkPolicy
}
// Output is the result of one translation pass.
type Output struct {
// Rules is the flat ordered list of allow rules to render. The
// renderer groups them by (PodKey, Direction) into chains.
Rules []Rule
// Isolated is the set of (PodKey, Direction) pairs whose chain must
// have a default-deny policy. A pod selected by at least one policy
// in a given direction shows up here. The renderer uses this to
// decide whether to emit a chain at all and what its base policy is.
Isolated map[Isolation]struct{}
// Pods carries the HostIface + IPs for every local pod referenced
// by the policy world, including pods that produced only isolation
// (default-deny) without any allow rules. The renderer needs this
// because such a pod has no Rule to lift the HostIface from.
Pods map[string]LocalPod // key = namespace/name
}
// Isolation is the (PodKey, Direction) key of the Isolated map.
type Isolation struct {
PodKey string
Direction Direction
}
// Translate runs the translation pass. It is a pure function: same Inputs
// always produces semantically equal Output. (Order of slices is stable
// but Rules within a chain follow the order in which selecting policies
// appear, which is itself sorted; see canonicalisePolicies.)
//
// Errors are returned only for unrecoverable malformed input; per-rule
// translation errors are logged via warn and skipped so that a single
// broken policy can't take down enforcement for a whole node. The optional
// warn callback is invoked for each skipped sub-rule with a human-readable
// message. Pass nil to silently drop.
func Translate(in Inputs, warn func(string)) (Output, error) {
if warn == nil {
warn = func(string) {}
}
out := Output{
Isolated: map[Isolation]struct{}{},
Pods: map[string]LocalPod{},
}
policies := canonicalisePolicies(in.Policies)
nsByName := indexNamespaces(in.Namespaces)
peerPodsByNS := indexPeerPods(in.PeerPods)
for _, pod := range in.LocalPods {
if len(pod.IPs) == 0 {
continue // no allocation yet; translator skips
}
key := pod.Namespace + "/" + pod.Name
// Find every policy in pod.Namespace whose podSelector matches.
// Cross-namespace policies do not select pods outside their own
// namespace; that's how the NetworkPolicy spec defines it.
for _, p := range policies {
if p.Namespace != pod.Namespace {
continue
}
sel, err := metav1.LabelSelectorAsSelector(&p.Spec.PodSelector)
if err != nil {
warn(fmt.Sprintf("policy %s/%s: invalid podSelector: %v", p.Namespace, p.Name, err))
continue
}
if !sel.Matches(labels.Set(pod.Labels)) {
continue
}
ingress, egress := policyDirections(&p)
if ingress || egress {
out.Pods[key] = LocalPod{
PodKey: key,
HostIface: pod.HostIface,
IPs: append([]net.IP(nil), pod.IPs...),
}
}
if ingress {
out.Isolated[Isolation{PodKey: key, Direction: DirIngress}] = struct{}{}
}
if egress {
out.Isolated[Isolation{PodKey: key, Direction: DirEgress}] = struct{}{}
}
// Translate ingress rules.
if ingress {
for ri, r := range p.Spec.Ingress {
rules, err := buildIngressRules(pod, r, p.Namespace, nsByName, peerPodsByNS)
if err != nil {
warn(fmt.Sprintf("policy %s/%s ingress[%d]: %v", p.Namespace, p.Name, ri, err))
continue
}
out.Rules = append(out.Rules, rules...)
}
}
// Translate egress rules.
if egress {
for ri, r := range p.Spec.Egress {
rules, err := buildEgressRules(pod, r, p.Namespace, nsByName, peerPodsByNS)
if err != nil {
warn(fmt.Sprintf("policy %s/%s egress[%d]: %v", p.Namespace, p.Name, ri, err))
continue
}
out.Rules = append(out.Rules, rules...)
}
}
}
}
return out, nil
}
// policyDirections reports which directions a NetworkPolicy isolates.
//
// Per the spec, the PolicyTypes field is the source of truth when set;
// when omitted, isolation is inferred from which rule lists are populated
// (Ingress always; Egress only if Spec.Egress is non-empty).
func policyDirections(p *netv1.NetworkPolicy) (ingress, egress bool) {
if len(p.Spec.PolicyTypes) > 0 {
for _, t := range p.Spec.PolicyTypes {
switch t {
case netv1.PolicyTypeIngress:
ingress = true
case netv1.PolicyTypeEgress:
egress = true
}
}
return
}
ingress = true
egress = len(p.Spec.Egress) > 0
return
}
// buildIngressRules expands one NetworkPolicyIngressRule into Rule(s).
// One Rule per allowed peer-set; each Rule carries the full Ports filter
// from the source rule.
func buildIngressRules(
pod Pod,
r netv1.NetworkPolicyIngressRule,
policyNS string,
nsByName map[string]Namespace,
peerPodsByNS map[string][]PeerPod,
) ([]Rule, error) {
ports, err := translatePorts(r.Ports)
if err != nil {
return nil, err
}
peers, err := translatePeers(r.From, policyNS, nsByName, peerPodsByNS)
if err != nil {
return nil, err
}
return assembleRules(pod, DirIngress, peers, ports), nil
}
// buildEgressRules is the egress mirror of buildIngressRules.
func buildEgressRules(
pod Pod,
r netv1.NetworkPolicyEgressRule,
policyNS string,
nsByName map[string]Namespace,
peerPodsByNS map[string][]PeerPod,
) ([]Rule, error) {
ports, err := translatePorts(r.Ports)
if err != nil {
return nil, err
}
peers, err := translatePeers(r.To, policyNS, nsByName, peerPodsByNS)
if err != nil {
return nil, err
}
return assembleRules(pod, DirEgress, peers, ports), nil
}
// peerSet is the resolved peer information for one rule's From / To list.
type peerSet struct {
// allowAll is true when the rule has no peers at all (an empty From /
// To list, which the spec defines as "from anywhere"). It overrides
// CIDRs and Except.
allowAll bool
// CIDRs is the union of every IP / CIDR contributed by the rule's
// peer entries (resolved Pod IPs, namespace pods, and ipBlock.cidr).
CIDRs []*net.IPNet
// Except is the union of every ipBlock.except entry across the rule.
Except []*net.IPNet
}
// translatePeers resolves a list of NetworkPolicyPeer entries into a
// peerSet. Each peer entry contributes either CIDRs (resolved from
// pod / namespace selectors, or copied from ipBlock) or Except entries.
func translatePeers(
peers []netv1.NetworkPolicyPeer,
policyNS string,
nsByName map[string]Namespace,
peerPodsByNS map[string][]PeerPod,
) (peerSet, error) {
if len(peers) == 0 {
return peerSet{allowAll: true}, nil
}
out := peerSet{}
for i, p := range peers {
switch {
case p.IPBlock != nil:
_, cidr, err := net.ParseCIDR(p.IPBlock.CIDR)
if err != nil {
return peerSet{}, fmt.Errorf("peer[%d] ipBlock.cidr %q: %w", i, p.IPBlock.CIDR, err)
}
out.CIDRs = append(out.CIDRs, cidr)
for j, ex := range p.IPBlock.Except {
_, exNet, err := net.ParseCIDR(ex)
if err != nil {
return peerSet{}, fmt.Errorf("peer[%d] ipBlock.except[%d] %q: %w", i, j, ex, err)
}
out.Except = append(out.Except, exNet)
}
case p.PodSelector != nil || p.NamespaceSelector != nil:
ips, err := resolvePodNamespacePeer(p, policyNS, nsByName, peerPodsByNS)
if err != nil {
return peerSet{}, fmt.Errorf("peer[%d]: %w", i, err)
}
out.CIDRs = append(out.CIDRs, ips...)
default:
return peerSet{}, fmt.Errorf("peer[%d] is empty (must set ipBlock, podSelector, or namespaceSelector)", i)
}
}
return out, nil
}
// resolvePodNamespacePeer walks the cluster's peer-pod set and returns
// /128 (v6) and /32 (v4) CIDRs for each pod that matches the (possibly
// combined) pod + namespace selectors.
//
// Selector semantics from the NetworkPolicy spec:
//
// - podSelector + namespaceSelector both nil → handled upstream.
// - podSelector set, namespaceSelector nil → match in the policy's
// own namespace.
// - podSelector nil, namespaceSelector set → match every pod in
// namespaces that match the namespaceSelector.
// - both set → AND: pod must be in a matching namespace AND match
// the podSelector.
//
// An empty (non-nil) selector matches everything in scope.
func resolvePodNamespacePeer(
p netv1.NetworkPolicyPeer,
policyNS string,
nsByName map[string]Namespace,
peerPodsByNS map[string][]PeerPod,
) ([]*net.IPNet, error) {
var podSel, nsSel labels.Selector
if p.PodSelector != nil {
s, err := metav1.LabelSelectorAsSelector(p.PodSelector)
if err != nil {
return nil, fmt.Errorf("podSelector: %w", err)
}
podSel = s
}
if p.NamespaceSelector != nil {
s, err := metav1.LabelSelectorAsSelector(p.NamespaceSelector)
if err != nil {
return nil, fmt.Errorf("namespaceSelector: %w", err)
}
nsSel = s
}
// Decide which namespaces are in scope.
var inScope []string
if nsSel == nil {
// Pod-only selector → just the policy's own namespace.
inScope = []string{policyNS}
} else {
for name, ns := range nsByName {
if nsSel.Matches(labels.Set(ns.Labels)) {
inScope = append(inScope, name)
}
}
}
var out []*net.IPNet
for _, ns := range inScope {
for _, pp := range peerPodsByNS[ns] {
if podSel != nil && !podSel.Matches(labels.Set(pp.Labels)) {
continue
}
for _, ip := range pp.IPs {
out = append(out, ipToHostCIDR(ip))
}
}
}
return out, nil
}
// translatePorts converts NetworkPolicyPort entries into PortMatch.
//
// A nil/empty Ports list on a NetworkPolicy rule means "all ports" by
// spec; we represent that as a single zero-valued PortMatch (any proto,
// any port) so the renderer can emit a single rule rather than a chain
// of port-equality matches.
func translatePorts(ports []netv1.NetworkPolicyPort) ([]PortMatch, error) {
if len(ports) == 0 {
return []PortMatch{{}}, nil
}
var out []PortMatch
for i, p := range ports {
var protoStr string
if p.Protocol != nil {
switch *p.Protocol {
case "TCP":
protoStr = "tcp"
case "UDP":
protoStr = "udp"
case "SCTP":
protoStr = "sctp"
default:
return nil, fmt.Errorf("port[%d]: protocol %q not supported", i, *p.Protocol)
}
} else {
// Spec default: TCP. We use empty string to mean "any of
// the three" only when the user explicitly sets neither
// protocol nor port; here the user has supplied a Port,
// which implies a protocol — and the spec default is TCP.
protoStr = "tcp"
}
var port, endPort int
if p.Port != nil {
if p.Port.Type != 0 { // intstr.Int = 0; intstr.String = 1
return nil, fmt.Errorf("port[%d]: named ports are not yet supported", i)
}
port = int(p.Port.IntVal)
}
if p.EndPort != nil {
endPort = int(*p.EndPort)
if endPort < port {
return nil, fmt.Errorf("port[%d]: endPort %d < port %d", i, endPort, port)
}
}
out = append(out, PortMatch{Protocol: protoStr, Port: port, EndPort: endPort})
}
return out, nil
}
// assembleRules emits the cross-product of (one peer-set) × (port list).
// We currently emit a single Rule per direction since the peer-set is the
// expensive shared field; ports go inline. allowAll peers result in a
// rule with no PeerCIDRs, which the renderer treats as "any source".
func assembleRules(pod Pod, dir Direction, peers peerSet, ports []PortMatch) []Rule {
if !peers.allowAll && len(peers.CIDRs) == 0 {
// Selector matched no peers (e.g. podSelector for a label that
// no live pod has). Emit nothing — the rule cannot allow any
// real traffic. The pod stays in default-deny for this rule.
return nil
}
r := Rule{
PodKey: pod.Namespace + "/" + pod.Name,
HostIface: pod.HostIface,
PodIPs: append([]net.IP(nil), pod.IPs...),
Direction: dir,
Action: ActionAccept,
Ports: append([]PortMatch(nil), ports...),
}
if !peers.allowAll {
r.PeerCIDRs = append([]*net.IPNet(nil), peers.CIDRs...)
r.PeerExcept = append([]*net.IPNet(nil), peers.Except...)
}
return []Rule{r}
}
// canonicalisePolicies sorts the policy slice by (namespace, name) so the
// translator's output is deterministic regardless of informer event order.
func canonicalisePolicies(p []netv1.NetworkPolicy) []netv1.NetworkPolicy {
out := append([]netv1.NetworkPolicy(nil), p...)
sort.Slice(out, func(i, j int) bool {
if out[i].Namespace != out[j].Namespace {
return out[i].Namespace < out[j].Namespace
}
return out[i].Name < out[j].Name
})
return out
}
func indexNamespaces(nss []Namespace) map[string]Namespace {
out := make(map[string]Namespace, len(nss))
for _, ns := range nss {
out[ns.Name] = ns
}
return out
}
func indexPeerPods(pods []PeerPod) map[string][]PeerPod {
out := map[string][]PeerPod{}
for _, p := range pods {
out[p.Namespace] = append(out[p.Namespace], p)
}
// Sort each namespace's pod list by (name) so the translator's IP
// ordering is stable.
for k := range out {
sort.Slice(out[k], func(i, j int) bool { return out[k][i].Name < out[k][j].Name })
}
return out
}
// ipToHostCIDR returns ip/32 (v4) or ip/128 (v6) — the smallest CIDR
// covering exactly that one address.
func ipToHostCIDR(ip net.IP) *net.IPNet {
if v4 := ip.To4(); v4 != nil {
return &net.IPNet{IP: v4, Mask: net.CIDRMask(32, 32)}
}
return &net.IPNet{IP: ip.To16(), Mask: net.CIDRMask(128, 128)}
}
+147
View File
@@ -0,0 +1,147 @@
package netpol
import (
"net"
"strings"
"testing"
corev1 "k8s.io/api/core/v1"
netv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
)
// FuzzTranslate_AndRender stitches the Translator and Renderer together
// against synthetic NetworkPolicies built from fuzzed bytes. We are not
// trying to produce *valid* policies — the goal is to confirm that:
//
// 1. Neither stage panics on weird input.
// 2. Render output is balanced (every "{" has a matching "}").
// 3. Rendering twice is byte-stable.
// 4. The Pods set in Output is consistent with Isolated (every isolated
// PodKey has a matching entry in Pods).
//
// The translator's warn callback is captured to ensure it never panics
// with unexpected message types either.
func FuzzTranslate_AndRender(f *testing.F) {
type seed struct {
policyNS, policyName string
podSelectorKey, podSelValue string
peerSelectorKey, peerSelV string
peerNS, peerName, peerIP string
port uint16
ipBlockCIDR, ipBlockExcept string
}
for _, s := range []seed{
{policyNS: "ns1", policyName: "p1", podSelectorKey: "app", podSelValue: "web", port: 80},
{policyNS: "ns1", policyName: "p1", peerSelectorKey: "app", peerSelV: "client", peerNS: "ns1", peerName: "c1", peerIP: "2001:db8::aa", port: 443},
{policyNS: "ns1", policyName: "p1", ipBlockCIDR: "10.0.0.0/8", ipBlockExcept: "10.99.0.0/16", port: 0},
{policyNS: "", policyName: ""}, // pathological
{policyNS: "ns1", policyName: "p1", podSelectorKey: "app\x00", podSelValue: "web\nnewline"},
{policyNS: "ns1", policyName: "p1", port: 65535},
{policyNS: "ns1", policyName: "p1", port: 1},
} {
f.Add(s.policyNS, s.policyName, s.podSelectorKey, s.podSelValue,
s.peerSelectorKey, s.peerSelV, s.peerNS, s.peerName, s.peerIP,
s.port, s.ipBlockCIDR, s.ipBlockExcept)
}
f.Fuzz(func(t *testing.T,
policyNS, policyName,
podSelectorKey, podSelValue,
peerSelectorKey, peerSelV,
peerNS, peerName, peerIP string,
port uint16,
ipBlockCIDR, ipBlockExcept string,
) {
// Build a synthetic policy.
policy := netv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{Namespace: policyNS, Name: policyName},
Spec: netv1.NetworkPolicySpec{
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
},
}
if podSelectorKey != "" {
policy.Spec.PodSelector = metav1.LabelSelector{
MatchLabels: map[string]string{podSelectorKey: podSelValue},
}
} else {
policy.Spec.PodSelector = metav1.LabelSelector{}
}
ingress := netv1.NetworkPolicyIngressRule{}
if peerSelectorKey != "" {
ingress.From = append(ingress.From, netv1.NetworkPolicyPeer{
PodSelector: &metav1.LabelSelector{
MatchLabels: map[string]string{peerSelectorKey: peerSelV},
},
})
}
if ipBlockCIDR != "" {
peer := netv1.NetworkPolicyPeer{
IPBlock: &netv1.IPBlock{CIDR: ipBlockCIDR},
}
if ipBlockExcept != "" {
peer.IPBlock.Except = []string{ipBlockExcept}
}
ingress.From = append(ingress.From, peer)
}
if port != 0 {
tcp := corev1.ProtocolTCP
p := intstr.FromInt32(int32(port))
ingress.Ports = append(ingress.Ports, netv1.NetworkPolicyPort{
Protocol: &tcp, Port: &p,
})
}
policy.Spec.Ingress = append(policy.Spec.Ingress, ingress)
// Local pod, possibly matching the policy.
pod := Pod{
Namespace: "ns1", Name: "web",
Labels: map[string]string{podSelectorKey: podSelValue, "app": "web"},
HostIface: "flock00000001",
IPs: []net.IP{mustIP("2001:db8::1")},
}
// Peer pod, possibly matching the peer selector.
var peers []PeerPod
if peerName != "" {
peerIPParsed := net.ParseIP(peerIP)
if peerIPParsed != nil {
peers = append(peers, PeerPod{
Namespace: peerNS, Name: peerName,
Labels: map[string]string{peerSelectorKey: peerSelV},
IPs: []net.IP{peerIPParsed},
})
}
}
out, err := Translate(Inputs{
LocalPods: []Pod{pod},
PeerPods: peers,
Namespaces: []Namespace{
{Name: "ns1", Labels: map[string]string{"kubernetes.io/metadata.name": "ns1"}},
},
Policies: []netv1.NetworkPolicy{policy},
}, func(string) {})
if err != nil {
return // any error is acceptable
}
// Property: every isolated PodKey appears in Output.Pods.
for iso := range out.Isolated {
if _, ok := out.Pods[iso.PodKey]; !ok {
t.Fatalf("isolated %s has no Pods entry", iso.PodKey)
}
}
script := Render(out)
// Property: balanced braces.
if got := strings.Count(script, "{") - strings.Count(script, "}"); got != 0 {
t.Fatalf("unbalanced braces (%d):\n%s", got, script)
}
// Property: deterministic (run again, compare).
script2 := Render(out)
if script != script2 {
t.Fatalf("Render not deterministic")
}
})
}
+452
View File
@@ -0,0 +1,452 @@
package netpol
import (
"net"
"testing"
corev1 "k8s.io/api/core/v1"
netv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
)
func mustIP(s string) net.IP {
ip := net.ParseIP(s)
if ip == nil {
panic("bad IP: " + s)
}
return ip
}
func newPolicy(ns, name string, mods ...func(*netv1.NetworkPolicy)) netv1.NetworkPolicy {
p := netv1.NetworkPolicy{
ObjectMeta: metav1.ObjectMeta{Namespace: ns, Name: name},
Spec: netv1.NetworkPolicySpec{},
}
for _, m := range mods {
m(&p)
}
return p
}
func tcpPort(port int) netv1.NetworkPolicyPort {
proto := corev1.ProtocolTCP
p := intstr.FromInt32(int32(port))
return netv1.NetworkPolicyPort{Protocol: &proto, Port: &p}
}
// Pod-only selector that matches everything (`{}`).
func emptySelector() *metav1.LabelSelector {
return &metav1.LabelSelector{}
}
func selectorMatching(kv map[string]string) *metav1.LabelSelector {
return &metav1.LabelSelector{MatchLabels: kv}
}
// Helper: collect Isolated keys for the given pod into a string list.
func isolationFor(out Output, podKey string) (in, eg bool) {
if _, ok := out.Isolated[Isolation{PodKey: podKey, Direction: DirIngress}]; ok {
in = true
}
if _, ok := out.Isolated[Isolation{PodKey: podKey, Direction: DirEgress}]; ok {
eg = true
}
return
}
// TestTranslate_NoPolicies — pod with no matching policy is unrestricted.
func TestTranslate_NoPolicies(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "p1",
Labels: map[string]string{"app": "web"},
HostIface: "flock00000001",
IPs: []net.IP{mustIP("2001:db8::1")},
}
out, err := Translate(Inputs{LocalPods: []Pod{pod}}, nil)
if err != nil {
t.Fatal(err)
}
if len(out.Rules) != 0 {
t.Fatalf("expected no rules, got %d", len(out.Rules))
}
in, eg := isolationFor(out, "ns1/p1")
if in || eg {
t.Fatalf("pod should not be isolated: in=%v eg=%v", in, eg)
}
}
// TestTranslate_DefaultDeny — a policy with empty Ingress + PolicyTypes
// = [Ingress] selects the pod and isolates it; no allow rules emitted.
func TestTranslate_DefaultDenyIngress(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web",
Labels: map[string]string{"app": "web"},
HostIface: "flock00000001",
IPs: []net.IP{mustIP("2001:db8::1")},
}
policy := newPolicy("ns1", "default-deny", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
})
out, err := Translate(Inputs{
LocalPods: []Pod{pod},
Policies: []netv1.NetworkPolicy{policy},
}, nil)
if err != nil {
t.Fatal(err)
}
if len(out.Rules) != 0 {
t.Fatalf("expected no rules from a deny-all, got %d", len(out.Rules))
}
in, eg := isolationFor(out, "ns1/web")
if !in {
t.Fatalf("ingress should be isolated")
}
if eg {
t.Fatalf("egress should NOT be isolated (policy only set ingress)")
}
}
// TestTranslate_DefaultDenyEgress_InferredFromEgressList — when
// PolicyTypes is omitted but Spec.Egress is non-empty, egress should
// also be isolated by inference.
func TestTranslate_DefaultDenyEgress_InferredFromEgressList(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web",
Labels: map[string]string{"app": "web"},
HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
}
policy := newPolicy("ns1", "egress-rule", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.Egress = []netv1.NetworkPolicyEgressRule{{}}
})
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
in, eg := isolationFor(out, "ns1/web")
if !in || !eg {
t.Fatalf("both directions should be isolated: in=%v eg=%v", in, eg)
}
}
// TestTranslate_PodSelectorPeer_SameNamespace — peer is a single pod in
// the same namespace, identified by label.
func TestTranslate_PodSelectorPeer(t *testing.T) {
web := Pod{
Namespace: "ns1", Name: "web",
Labels: map[string]string{"app": "web"},
HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
}
clientIP := mustIP("2001:db8::2")
peer := PeerPod{
Namespace: "ns1", Name: "client",
Labels: map[string]string{"app": "client"},
IPs: []net.IP{clientIP},
}
policy := newPolicy("ns1", "allow-from-client", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *selectorMatching(map[string]string{"app": "web"})
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
From: []netv1.NetworkPolicyPeer{{
PodSelector: selectorMatching(map[string]string{"app": "client"}),
}},
Ports: []netv1.NetworkPolicyPort{tcpPort(80)},
}}
})
out, err := Translate(Inputs{
LocalPods: []Pod{web},
PeerPods: []PeerPod{peer},
Policies: []netv1.NetworkPolicy{policy},
}, nil)
if err != nil {
t.Fatal(err)
}
if len(out.Rules) != 1 {
t.Fatalf("expected 1 rule, got %d: %+v", len(out.Rules), out.Rules)
}
r := out.Rules[0]
if r.PodKey != "ns1/web" || r.Direction != DirIngress {
t.Fatalf("rule has wrong subject: %+v", r)
}
if len(r.PeerCIDRs) != 1 || !r.PeerCIDRs[0].IP.Equal(clientIP) {
t.Fatalf("peer CIDR wrong: %+v", r.PeerCIDRs)
}
if len(r.Ports) != 1 || r.Ports[0].Protocol != "tcp" || r.Ports[0].Port != 80 {
t.Fatalf("port wrong: %+v", r.Ports)
}
}
// TestTranslate_NamespaceSelector — peer is "every pod in any namespace
// with label tier=trusted".
func TestTranslate_NamespaceSelector(t *testing.T) {
web := Pod{
Namespace: "ns1", Name: "web",
Labels: map[string]string{"app": "web"},
HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
}
out, err := Translate(Inputs{
LocalPods: []Pod{web},
Namespaces: []Namespace{
{Name: "ns1", Labels: map[string]string{}},
{Name: "trusted-1", Labels: map[string]string{"tier": "trusted"}},
{Name: "trusted-2", Labels: map[string]string{"tier": "trusted"}},
{Name: "untrusted", Labels: map[string]string{"tier": "wild"}},
},
PeerPods: []PeerPod{
{Namespace: "trusted-1", Name: "a", IPs: []net.IP{mustIP("2001:db8::a")}},
{Namespace: "trusted-2", Name: "b", IPs: []net.IP{mustIP("2001:db8::b")}},
{Namespace: "untrusted", Name: "x", IPs: []net.IP{mustIP("2001:db8::ff")}},
},
Policies: []netv1.NetworkPolicy{newPolicy("ns1", "allow-trusted", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
From: []netv1.NetworkPolicyPeer{{
NamespaceSelector: selectorMatching(map[string]string{"tier": "trusted"}),
}},
}}
})},
}, nil)
if err != nil {
t.Fatal(err)
}
if len(out.Rules) != 1 {
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
}
got := map[string]bool{}
for _, c := range out.Rules[0].PeerCIDRs {
got[c.IP.String()] = true
}
if !got["2001:db8::a"] || !got["2001:db8::b"] {
t.Fatalf("trusted pod IPs missing: %v", got)
}
if got["2001:db8::ff"] {
t.Fatalf("untrusted pod IP leaked into rule")
}
}
// TestTranslate_IPBlockWithExcept — ipBlock with an except range.
func TestTranslate_IPBlockWithExcept(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web", HostIface: "f1",
Labels: map[string]string{"app": "web"},
IPs: []net.IP{mustIP("10.0.0.1")},
}
policy := newPolicy("ns1", "ipblock", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
From: []netv1.NetworkPolicyPeer{{
IPBlock: &netv1.IPBlock{
CIDR: "10.0.0.0/8",
Except: []string{"10.99.0.0/16", "10.42.42.0/24"},
},
}},
}}
})
out, err := Translate(Inputs{
LocalPods: []Pod{pod},
Policies: []netv1.NetworkPolicy{policy},
}, nil)
if err != nil {
t.Fatal(err)
}
if len(out.Rules) != 1 {
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
}
r := out.Rules[0]
if len(r.PeerCIDRs) != 1 || r.PeerCIDRs[0].String() != "10.0.0.0/8" {
t.Fatalf("peer CIDR wrong: %v", r.PeerCIDRs)
}
if len(r.PeerExcept) != 2 {
t.Fatalf("expected 2 except, got %d", len(r.PeerExcept))
}
}
// TestTranslate_AllowAllPeers — empty From list means "from anywhere".
func TestTranslate_AllowAllPeers(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web", HostIface: "f1",
Labels: map[string]string{"app": "web"},
IPs: []net.IP{mustIP("2001:db8::1")},
}
policy := newPolicy("ns1", "allow-all-on-port", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
Ports: []netv1.NetworkPolicyPort{tcpPort(443)},
}}
})
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
if len(out.Rules) != 1 {
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
}
r := out.Rules[0]
if len(r.PeerCIDRs) != 0 || len(r.PeerExcept) != 0 {
t.Fatalf("expected allow-all peers, got CIDRs=%v Except=%v", r.PeerCIDRs, r.PeerExcept)
}
}
// TestTranslate_AllowAllPorts — empty Ports list means "all ports".
func TestTranslate_AllowAllPorts(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web", HostIface: "f1",
Labels: map[string]string{"app": "web"},
IPs: []net.IP{mustIP("2001:db8::1")},
}
policy := newPolicy("ns1", "allow-from-all", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
From: []netv1.NetworkPolicyPeer{{
PodSelector: emptySelector(),
}},
}}
})
peer := PeerPod{
Namespace: "ns1", Name: "x",
IPs: []net.IP{mustIP("2001:db8::aa")},
}
out, _ := Translate(Inputs{
LocalPods: []Pod{pod}, PeerPods: []PeerPod{peer},
Policies: []netv1.NetworkPolicy{policy},
}, nil)
if len(out.Rules) != 1 {
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
}
r := out.Rules[0]
if len(r.Ports) != 1 || r.Ports[0] != (PortMatch{}) {
t.Fatalf("expected single any-port match, got %+v", r.Ports)
}
}
// TestTranslate_PortRange — endPort field.
func TestTranslate_PortRange(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web", HostIface: "f1",
Labels: map[string]string{"app": "web"},
IPs: []net.IP{mustIP("2001:db8::1")},
}
policy := newPolicy("ns1", "range", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
proto := corev1.ProtocolTCP
port := intstr.FromInt32(8000)
end := int32(8999)
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
Ports: []netv1.NetworkPolicyPort{{Protocol: &proto, Port: &port, EndPort: &end}},
}}
})
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
if len(out.Rules) != 1 || out.Rules[0].Ports[0].Port != 8000 || out.Rules[0].Ports[0].EndPort != 8999 {
t.Fatalf("range not preserved: %+v", out.Rules)
}
}
// TestTranslate_NamedPortRejected — named ports aren't supported yet;
// translator must skip the rule and warn.
func TestTranslate_NamedPortRejected(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web", HostIface: "f1",
Labels: map[string]string{"app": "web"},
IPs: []net.IP{mustIP("2001:db8::1")},
}
proto := corev1.ProtocolTCP
named := intstr.FromString("http")
policy := newPolicy("ns1", "named", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
Ports: []netv1.NetworkPolicyPort{{Protocol: &proto, Port: &named}},
}}
})
var warns []string
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, func(s string) {
warns = append(warns, s)
})
if len(out.Rules) != 0 {
t.Fatalf("expected named-port rule to be skipped")
}
if len(warns) == 0 {
t.Fatalf("expected a warning about named ports")
}
// The pod should still be isolated since the policy selected it.
in, _ := isolationFor(out, "ns1/web")
if !in {
t.Fatalf("pod should be isolated even when its rule is dropped")
}
}
// TestTranslate_PolicyOnlyAppliesToOwnNamespace — a policy in nsA does
// NOT select pods in nsB even if their labels match.
func TestTranslate_PolicyScopedToNamespace(t *testing.T) {
a := Pod{Namespace: "nsA", Name: "p", HostIface: "f1",
Labels: map[string]string{"app": "web"}, IPs: []net.IP{mustIP("2001:db8::1")}}
b := Pod{Namespace: "nsB", Name: "p", HostIface: "f2",
Labels: map[string]string{"app": "web"}, IPs: []net.IP{mustIP("2001:db8::2")}}
policy := newPolicy("nsA", "deny", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *selectorMatching(map[string]string{"app": "web"})
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
})
out, _ := Translate(Inputs{LocalPods: []Pod{a, b}, Policies: []netv1.NetworkPolicy{policy}}, nil)
inA, _ := isolationFor(out, "nsA/p")
inB, _ := isolationFor(out, "nsB/p")
if !inA {
t.Fatalf("nsA/p should be isolated")
}
if inB {
t.Fatalf("nsB/p must NOT be isolated by a policy in nsA")
}
}
// TestTranslate_PodWithoutAllocationSkipped — pod with no IPs is silently
// skipped (its rule could not match any traffic anyway).
func TestTranslate_PodWithoutAllocationSkipped(t *testing.T) {
pod := Pod{Namespace: "ns1", Name: "p", HostIface: "f1",
Labels: map[string]string{"app": "web"}}
policy := newPolicy("ns1", "deny", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
})
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
in, _ := isolationFor(out, "ns1/p")
if in {
t.Fatalf("pod without IP should not appear in output")
}
}
// TestTranslate_Determinism — translating the same Inputs twice produces
// equal outputs (Rules in equal order, Isolated equal).
func TestTranslate_Determinism(t *testing.T) {
pod := Pod{
Namespace: "ns1", Name: "web", HostIface: "f1",
Labels: map[string]string{"app": "web"},
IPs: []net.IP{mustIP("2001:db8::1")},
}
peers := []PeerPod{
{Namespace: "ns1", Name: "z", Labels: map[string]string{"app": "client"}, IPs: []net.IP{mustIP("2001:db8::2")}},
{Namespace: "ns1", Name: "a", Labels: map[string]string{"app": "client"}, IPs: []net.IP{mustIP("2001:db8::3")}},
}
policies := []netv1.NetworkPolicy{
newPolicy("ns1", "z-second", func(p *netv1.NetworkPolicy) {
p.Spec.PodSelector = *emptySelector()
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
From: []netv1.NetworkPolicyPeer{{
PodSelector: selectorMatching(map[string]string{"app": "client"}),
}},
}}
}),
}
in := Inputs{LocalPods: []Pod{pod}, PeerPods: peers, Policies: policies}
a, _ := Translate(in, nil)
b, _ := Translate(in, nil)
if len(a.Rules) != len(b.Rules) {
t.Fatalf("rule count differs: %d vs %d", len(a.Rules), len(b.Rules))
}
for i := range a.Rules {
if a.Rules[i].PodKey != b.Rules[i].PodKey || len(a.Rules[i].PeerCIDRs) != len(b.Rules[i].PeerCIDRs) {
t.Fatalf("rule[%d] differs", i)
}
}
}
+147
View File
@@ -0,0 +1,147 @@
package netpol
import "net"
// Direction is the NetworkPolicy direction, named from the *pod's*
// perspective (matching the NetworkPolicy API). "Ingress" is traffic
// arriving at the pod; "Egress" is traffic the pod initiates.
//
// Note that on the host this maps the opposite way at the veth: an
// Ingress rule matches packets whose oifname is the pod's host-side veth
// (the kernel is forwarding into the pod), and an Egress rule matches
// packets whose iifname is the pod's host-side veth (the kernel just
// received from the pod).
type Direction int
const (
DirIngress Direction = iota
DirEgress
)
// String returns the lower-case wire form ("ingress" / "egress").
func (d Direction) String() string {
if d == DirEgress {
return "egress"
}
return "ingress"
}
// Pod is the local-pod information the translator needs. The reconciler
// populates this from its store of CNI allocations — every pod with a
// committed allocation on this node appears here.
type Pod struct {
// Namespace + Name uniquely identify the pod.
Namespace string
Name string
// Labels are the pod labels. NetworkPolicy.Spec.PodSelector matches
// against these.
Labels map[string]string
// HostIface is the host-side veth name (e.g. "flock1a2b3c4d"). All
// rules guarding this pod hook off iifname/oifname == HostIface.
HostIface string
// IPs are the pod's eth0 addresses (IPv6 and/or IPv4). Empty means
// the agent has no allocation for this pod yet — translator should
// skip such pods.
IPs []net.IP
}
// PeerPod is a (potentially remote) pod whose IPs may be referenced as a
// NetworkPolicy peer. The translator resolves podSelector +
// namespaceSelector peers to their IPs by walking the cluster-wide
// peer-pod set.
type PeerPod struct {
Namespace string
Name string
Labels map[string]string
IPs []net.IP
}
// Namespace carries just enough metadata for namespaceSelector matching.
type Namespace struct {
Name string
Labels map[string]string
}
// LocalPod is the renderer-visible subset of a local pod — just enough
// to anchor a base-chain jump. Carried in Output so the renderer can
// emit chains for default-deny pods that have no explicit allow rules.
type LocalPod struct {
PodKey string
HostIface string
IPs []net.IP
}
// PortMatch is one allowed (protocol, port) tuple. EndPort is inclusive;
// when zero the rule matches the single Port.
type PortMatch struct {
Protocol string // "tcp", "udp", "sctp"; empty means "any of the three"
Port int // 1..65535. Zero means "any port".
EndPort int // 0 if not a range; otherwise inclusive range end.
}
// Rule is the canonical intermediate representation between the translator
// and the renderer. One Rule is one accept-line in the rendered nft
// script. A pod's chain is the ordered concatenation of every Rule whose
// PodKey matches; any packet that falls off the end is denied by the
// trailing default-deny verdict (the chain has policy drop).
//
// PeerCIDRs are OR'd together, then PeerExcept is subtracted. Empty
// PeerCIDRs + empty PeerExcept means "any source/destination".
type Rule struct {
// PodKey is namespace/name of the pod this rule guards. Used by the
// renderer to slot the rule into the correct chain.
PodKey string
// HostIface is the pod's host-side veth name; the renderer uses it
// to anchor the base-chain jump.
HostIface string
// PodIPs are the pod's eth0 addresses. The base chain matches on
// (oifname == HostIface AND daddr ∈ PodIPs) for ingress, and
// (iifname == HostIface AND saddr ∈ PodIPs) for egress, so packets
// that aren't destined to / from the actual pod address don't get
// counted as policy-protected.
PodIPs []net.IP
// Direction is Ingress or Egress, named from the pod's perspective.
Direction Direction
// Action is "accept" for explicit allows; default-deny is implicit
// in the chain's policy drop and is not represented as a Rule.
// (Reserved for future deny-list semantics like AdminNetworkPolicy.)
Action Action
// PeerCIDRs are the addresses of allowed peers. OR'd together.
// Empty means "any peer".
PeerCIDRs []*net.IPNet
// PeerExcept narrows PeerCIDRs by subtracting these ranges. Only
// meaningful with non-empty PeerCIDRs (it comes from
// ipBlock.except, which requires ipBlock.cidr).
PeerExcept []*net.IPNet
// Ports is the set of allowed (protocol, port) tuples. Empty means
// "any port / any protocol".
Ports []PortMatch
}
// Action is the verdict emitted by a Rule.
type Action int
const (
// ActionAccept lets the packet through. The default-deny is implicit
// in the chain policy.
ActionAccept Action = iota
// ActionDrop is reserved for future use (AdminNetworkPolicy /
// BaselineAdminNetworkPolicy explicit denies). Not produced by the
// v1 translator.
ActionDrop
)
// String returns the nft-syntax verdict.
func (a Action) String() string {
if a == ActionDrop {
return "drop"
}
return "accept"
}
+56
View File
@@ -0,0 +1,56 @@
package agent
import (
"net"
"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
)
// collectLocalPods bridges the agent's allocation store + pod informer
// cache into the netpol-package input shape. It returns one Pod per
// committed allocation that has a matching pod in the informer cache;
// allocations whose pod was just deleted (DEL race) are skipped.
//
// Called on every netpol reconcile pass, so it must be cheap. The work
// here is O(allocations) and reads from in-memory maps only.
func collectLocalPods(store *Store, pods *PodCache) []netpol.Pod {
allocs := store.Snapshot()
out := make([]netpol.Pod, 0, len(allocs))
for _, a := range allocs {
if a.State != StateCommitted {
continue
}
pod, ok := pods.Get(a.Namespace, a.PodName)
if !ok {
// Pod evicted but DEL hasn't fired yet; nothing to enforce.
continue
}
ips := allocationIPs(a)
if len(ips) == 0 {
continue
}
out = append(out, netpol.Pod{
Namespace: a.Namespace,
Name: a.PodName,
Labels: pod.Labels,
HostIface: HostIfaceName(a.ContainerID),
IPs: ips,
})
}
return out
}
func allocationIPs(a Allocation) []net.IP {
var out []net.IP
if a.IP6 != "" {
if ip := net.ParseIP(a.IP6); ip != nil {
out = append(out, ip)
}
}
if a.IP4 != "" {
if ip := net.ParseIP(a.IP4); ip != nil {
out = append(out, ip)
}
}
return out
}
+19 -1
View File
@@ -7,6 +7,8 @@ import (
"fmt"
"net"
"time"
"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
)
// configureRuntime wires Pod informer, IPAM, netlink, and BIRD on a real
@@ -103,6 +105,17 @@ func (s *Server) configureRuntime(ctx context.Context) error {
}
}()
// NetworkPolicy enforcement.
world := netpol.NewWorld(s.Logger)
if err := world.Start(ctx, s.restCfg); err != nil {
return fmt.Errorf("netpol informers: %w", err)
}
npApplier := &netpol.Applier{}
npReconciler := netpol.NewReconciler(world, func() []netpol.Pod {
return collectLocalPods(s.Store, pods)
}, npApplier, s.Logger)
go npReconciler.Run(ctx)
handler := &PodHandler{
Node: s.Node,
Store: s.Store,
@@ -111,7 +124,12 @@ func (s *Server) configureRuntime(ctx context.Context) error {
NodeConfig: s.NodeConfig,
SetupFunc: Setup,
TeardownFunc: Teardown,
AfterCommit: anycast.Trigger,
AfterCommit: func() {
anycast.Trigger()
// Re-evaluate policy on every CNI ADD/DEL so a brand-new
// pod's chain lands before its first packet egresses.
npReconciler.Trigger()
},
}
s.RPC.SetHandlers(handler.Add, handler.Del, handler.Check)
s.Logger.Info("runtime ready",
+3 -3
View File
@@ -1,6 +1,6 @@
// Package agent owns the in-process flock-agent runtime: IPAM, netns, state,
// anycast, and NetworkPolicy. This file implements the durable per-node
// allocation file at /var/lib/flock/allocations.json.
// This file implements the durable per-node allocation file at
// /var/lib/flock/allocations.json. The package-level doc lives in doc.go.
package agent
import (
+58 -2
View File
@@ -1,3 +1,8 @@
// Package v1alpha1 contains the operator-facing API types for flock.
//
// Stability: alpha. The shape of these types may change in incompatible ways
// between minor releases. CRDs are versioned and the agent reads only its
// pinned version.
package v1alpha1
import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@ -6,26 +11,77 @@ import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
//
// The agent reads this on startup and via informer for live updates. There is
// no controller and no auto-allocation — purely declarative input.
//
// A NodeConfig's name MUST equal the Kubernetes node name it configures
// (NodeConfigs are cluster-scoped). The agent ignores all NodeConfigs whose
// name does not match its own node.
type NodeConfigSpec struct {
// CIDR6 is the set of IPv6 CIDRs this node owns and advertises as BGP
// aggregates. Pod IPv6 addresses are allocated from these.
// aggregates. Pod IPv6 addresses are allocated from these. May be empty
// only if Defaults disables IPv6 for every pod on this node.
CIDR6 []string `json:"cidr6,omitempty"`
// CIDR4 is the set of IPv4 CIDRs this node owns and advertises as BGP
// aggregates. Pod IPv4 addresses are allocated from these.
// aggregates. Pod IPv4 addresses are allocated from these. May be empty
// when no pod on this node ever opts into IPv4.
CIDR4 []string `json:"cidr4,omitempty"`
// BGP configures the BGP sessions this node establishes upstream.
BGP BGPSpec `json:"bgp"`
// Defaults sets the per-node baseline for which address families a pod
// receives when its own annotations don't say. Pod-level
// `flock.fritzlab.net/ipv6` and `flock.fritzlab.net/ipv4` annotations
// always override these defaults.
//
// When a field is unset (nil), the agent falls back to its built-in
// baseline of IPv6=true, IPv4=false. When the whole Defaults block is
// nil, both built-in defaults apply.
//
// Typical uses:
// - dual-stack node: Defaults: { ipv6: true, ipv4: true }
// - IPv4-only node: Defaults: { ipv6: false, ipv4: true }
// - default (omit Defaults entirely): IPv6-only.
//
// Validation: at least one of IPv6 or IPv4 must end up true after merging
// (annotations + defaults + built-in baseline). The agent rejects pods
// that resolve to neither.
Defaults *FamilyDefaults `json:"defaults,omitempty"`
}
// FamilyDefaults is the per-node default for which address families a pod
// receives when its annotations don't specify. Each field is a pointer so
// "unset" is distinguishable from explicit "false".
type FamilyDefaults struct {
// IPv6 is the default value for the `flock.fritzlab.net/ipv6` annotation.
// nil → fall back to the built-in baseline (true).
IPv6 *bool `json:"ipv6,omitempty"`
// IPv4 is the default value for the `flock.fritzlab.net/ipv4` annotation.
// nil → fall back to the built-in baseline (false).
IPv4 *bool `json:"ipv4,omitempty"`
}
// BGPSpec describes this node's BGP speaker configuration. Each upstream peer
// becomes one BGP session in the rendered bird.conf.
type BGPSpec struct {
// ASN is this node's local autonomous system number. flock uses private
// ASNs in the 64512-65534 range by convention but accepts any value.
ASN uint32 `json:"asn"`
// Peers is the set of upstream BGP neighbors. At least one is required
// for BGP advertisement to function. Multiple peers of the same family
// are allowed (multi-homing).
Peers []BGPPeer `json:"peers"`
}
// BGPPeer is a single upstream BGP neighbor.
type BGPPeer struct {
// Address is the peer's IP. May be IPv4 or IPv6. The agent picks an
// appropriate local source address on the same subnet.
Address string `json:"address"`
// ASN is the peer's remote ASN.
ASN uint32 `json:"asn"`
}
+103
View File
@@ -0,0 +1,103 @@
package embed
import (
"net"
"testing"
)
// FuzzEmbed verifies that Embed never panics and that any successful return
// keeps the output address inside the requested network.
func FuzzEmbed(f *testing.F) {
type seed struct {
prefix string
fields string // comma-separated, mapped below to []Field
ns, pod string
image string
fallback string
nNibble byte
}
for _, s := range []seed{
{"2602:817:3000:f001::/64", "namespace,pod,image", "mail", "stalwart-0", "", "ctr", 0xe},
{"2001:db8::/64", "namespace", "ns", "p", "", "", 0},
{"2001:db8::/96", "pod", "", "podname", "", "ctr", 0xf},
{"2001:db8::/48", "namespace,pod", "ns", "p", "", "ctr", 0x1},
{"2001:db8::/120", "namespace", "n", "p", "", "ctr", 0x0}, // 8 host nibbles
{"2001:db8::/124", "namespace", "n", "p", "", "ctr", 0x0}, // 4 host nibbles
{"2001:db8::/127", "namespace", "n", "p", "", "ctr", 0x0}, // not nibble-aligned
{"2001:db8::/63", "namespace", "n", "p", "", "ctr", 0x0}, // not nibble-aligned
{"2001:db8::/64", "namespace,pod,image", "", "", "sha256:abcdef0123456789aabbccddeeff00112233445566778899aabbccddeeff0011", "", 0xa},
{"2001:db8::/64", "namespace,pod,image", "", "", "", "ctr", 0xa},
{"2001:db8::/64", "namespace", "🦆", "🐧", "", "", 0},
{"2001:db8::/64", "namespace", "ns\x00\x00", "p", "", "", 0},
} {
f.Add(s.prefix, s.fields, s.ns, s.pod, s.image, s.fallback, s.nNibble)
}
f.Fuzz(func(t *testing.T, prefix, fieldsStr, ns, pod, image, fallback string, nNibble byte) {
_, network, err := net.ParseCIDR(prefix)
if err != nil {
return
}
fields, ok := decodeFields(fieldsStr)
if !ok {
return
}
got, err := Embed(network, fields, Values{
Namespace: ns,
Pod: pod,
Image: image,
ImageFallback: fallback,
}, nNibble)
if err != nil {
return
}
if !network.Contains(got) {
t.Fatalf("Embed(%s, %v) = %s, outside network", prefix, fields, got)
}
// Property: low nibble of last byte equals nNibble & 0x0F.
if want := nNibble & 0x0F; got[len(got)-1]&0x0F != want {
t.Fatalf("low nibble = %x, want %x", got[len(got)-1]&0x0F, want)
}
})
}
func decodeFields(s string) ([]Field, bool) {
if s == "" {
return nil, false
}
var out []Field
cur := []byte{}
flush := func() bool {
if len(cur) == 0 {
return true
}
switch string(cur) {
case string(FieldNamespace):
out = append(out, FieldNamespace)
case string(FieldPod):
out = append(out, FieldPod)
case string(FieldImage):
out = append(out, FieldImage)
default:
return false
}
cur = cur[:0]
return true
}
for i := 0; i < len(s); i++ {
if s[i] == ',' {
if !flush() {
return nil, false
}
continue
}
cur = append(cur, s[i])
}
if !flush() {
return nil, false
}
if len(out) == 0 {
return nil, false
}
return out, true
}
+129 -6
View File
@@ -9,6 +9,7 @@ import (
"fmt"
"net"
"sort"
"strings"
"text/template"
)
@@ -118,28 +119,150 @@ protocol bgp upstream4_{{$i}} {
{{end}}{{end}}`
// Render produces the bird.conf text.
//
// The output is deterministic: the same NodeBGP input always produces the
// same string. CIDR lists, anycast lists, and peer lists are sorted before
// templating so that the only way the rendered config changes is when
// semantically meaningful inputs change. This stability matters because
// BirdManager compares Render output against the last-written config to
// avoid superfluous birdc reloads.
//
// Render validates every operator-supplied value that flows into the
// templated output (peer addresses, CIDRs, anycast IPs, source addresses)
// so a malformed NodeConfig or annotation cannot produce a malformed
// bird.conf — even one that BIRD would later reject.
func Render(in NodeBGP) (string, error) {
if in.RouterID == "" {
return "", fmt.Errorf("RouterID is required")
return "", fmt.Errorf("bird render: RouterID is required")
}
if net.ParseIP(in.RouterID) == nil {
return "", fmt.Errorf("bird render: RouterID %q is not a valid IP", in.RouterID)
}
if in.LocalASN == 0 {
return "", fmt.Errorf("LocalASN is required")
return "", fmt.Errorf("bird render: LocalASN is required")
}
// Stable order — important so config changes only when something real
// changes (avoids needless birdc reloads).
if err := validateLocalSource(in.LocalV6, "v6"); err != nil {
return "", err
}
if err := validateLocalSource(in.LocalV4, "v4"); err != nil {
return "", err
}
for i, p := range in.Peers {
if err := validatePeer(p); err != nil {
return "", fmt.Errorf("bird render: peer[%d]: %w", i, err)
}
}
if err := validateCIDRs(in.CIDR6, "v6"); err != nil {
return "", fmt.Errorf("bird render: cidr6: %w", err)
}
if err := validateCIDRs(in.CIDR4, "v4"); err != nil {
return "", fmt.Errorf("bird render: cidr4: %w", err)
}
if err := validateAnycastIPs(in.Anycast6, "v6"); err != nil {
return "", fmt.Errorf("bird render: anycast6: %w", err)
}
if err := validateAnycastIPs(in.Anycast4, "v4"); err != nil {
return "", fmt.Errorf("bird render: anycast4: %w", err)
}
in = normalize(in)
t, err := template.New("bird").Parse(tpl)
if err != nil {
return "", err
return "", fmt.Errorf("bird template parse: %w", err)
}
var buf bytes.Buffer
if err := t.Execute(&buf, in); err != nil {
return "", err
return "", fmt.Errorf("bird template execute: %w", err)
}
return buf.String(), nil
}
// validatePeer checks that a peer entry has a parseable IP whose family
// matches its declared Family field, and a non-zero ASN.
func validatePeer(p Peer) error {
if p.ASN == 0 {
return fmt.Errorf("ASN must be non-zero")
}
ip := net.ParseIP(p.Address)
if ip == nil {
return fmt.Errorf("address %q is not a valid IP", p.Address)
}
isV4 := ip.To4() != nil
switch p.Family {
case "v6":
if isV4 {
return fmt.Errorf("address %q is IPv4 but Family is v6", p.Address)
}
case "v4":
if !isV4 {
return fmt.Errorf("address %q is IPv6 but Family is v4", p.Address)
}
default:
return fmt.Errorf("Family %q must be v6 or v4", p.Family)
}
return nil
}
// validateCIDRs parses each entry as a CIDR and rejects family mismatches.
// fam must be "v6" or "v4".
func validateCIDRs(cidrs []string, fam string) error {
for _, c := range cidrs {
_, n, err := net.ParseCIDR(c)
if err != nil {
return fmt.Errorf("invalid CIDR %q: %w", c, err)
}
isV4 := n.IP.To4() != nil
if fam == "v6" && isV4 {
return fmt.Errorf("CIDR %q is IPv4, expected IPv6", c)
}
if fam == "v4" && !isV4 {
return fmt.Errorf("CIDR %q is IPv6, expected IPv4", c)
}
}
return nil
}
// validateAnycastIPs parses each entry as a literal IP (no prefix) and rejects
// family mismatches.
func validateAnycastIPs(ips []string, fam string) error {
for _, s := range ips {
ip := net.ParseIP(s)
if ip == nil {
return fmt.Errorf("invalid IP %q", s)
}
isV4 := ip.To4() != nil
if fam == "v6" && isV4 {
return fmt.Errorf("IP %q is IPv4, expected IPv6", s)
}
if fam == "v4" && !isV4 {
return fmt.Errorf("IP %q is IPv6, expected IPv4", s)
}
}
return nil
}
// validateLocalSource validates an optional LocalV6/LocalV4 source address.
// Empty is allowed (BIRD picks its own); non-empty must be a parseable IP of
// the matching family.
func validateLocalSource(s, fam string) error {
if s == "" {
return nil
}
ip := net.ParseIP(s)
if ip == nil {
return fmt.Errorf("bird render: Local%s %q is not a valid IP", strings.ToUpper(fam), s)
}
isV4 := ip.To4() != nil
if fam == "v6" && isV4 {
return fmt.Errorf("bird render: LocalV6 %q is IPv4", s)
}
if fam == "v4" && !isV4 {
return fmt.Errorf("bird render: LocalV4 %q is IPv6", s)
}
return nil
}
func normalize(in NodeBGP) NodeBGP {
cp := in
cp.CIDR6 = sortedUnique(in.CIDR6)
+93
View File
@@ -0,0 +1,93 @@
package bird
import (
"strings"
"testing"
)
// FuzzRender drives the bird template with a wide range of inputs and
// confirms two safety properties:
//
// 1. Render never panics.
// 2. On nil-error return, the output is deterministic (calling Render
// twice with the same input yields byte-identical output) and contains
// no unbalanced braces (a smoke test for malformed template branches).
func FuzzRender(f *testing.F) {
type seed struct {
routerID string
asn uint32
peerAddr string
peerASN uint32
cidr6 string
cidr4 string
anycast6 string
anycast4 string
localV6 string
localV4 string
}
seeds := []seed{
{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1", peerASN: 65000, cidr6: "2001:db8:f001::/64"},
{routerID: "172.25.25.101", asn: 65101, peerAddr: "172.25.25.1", peerASN: 65000, cidr4: "172.25.210.0/24"},
{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1", peerASN: 65000, cidr6: "2001:db8:f001::/64", anycast6: "2001:db8:a::1"},
{routerID: "10.0.0.1", asn: 65101, peerAddr: "10.0.0.2", peerASN: 65000, cidr4: "10.0.0.0/24", anycast4: "10.255.0.1"},
{routerID: "10.0.0.1", asn: 65101}, // no peer, no cidrs
{routerID: "", asn: 65101, peerAddr: "10.0.0.2", peerASN: 1}, // empty routerID → expect error
{routerID: "10.0.0.1", asn: 0, peerAddr: "10.0.0.2", peerASN: 1}, // zero ASN → expect error
// Backtick-bearing inputs to defend the template against accidental
// closure of the raw-string literal.
{routerID: "10.0.0.1`", asn: 65101},
// Newlines and template-meta in user-supplied addresses
{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1\n{{kaboom}}", peerASN: 65000, cidr6: "2001:db8:f001::/64"},
}
for _, s := range seeds {
f.Add(s.routerID, s.asn, s.peerAddr, s.peerASN, s.cidr6, s.cidr4, s.anycast6, s.anycast4, s.localV6, s.localV4)
}
f.Fuzz(func(t *testing.T, routerID string, asn uint32, peerAddr string, peerASN uint32, cidr6, cidr4, anycast6, anycast4, localV6, localV4 string) {
in := NodeBGP{
RouterID: routerID,
LocalASN: asn,
LocalV6: localV6,
LocalV4: localV4,
}
// Add the peer in whichever family it belongs to, if any. FamilyOf
// returns "" for non-IPs; that test exercises the "skip unknown
// family" branch in the bird agent code path.
if fam := FamilyOf(peerAddr); fam != "" {
in.Peers = []Peer{{Family: fam, Address: peerAddr, ASN: peerASN}}
}
if cidr6 != "" {
in.CIDR6 = []string{cidr6}
}
if cidr4 != "" {
in.CIDR4 = []string{cidr4}
}
if anycast6 != "" {
in.Anycast6 = []string{anycast6}
}
if anycast4 != "" {
in.Anycast4 = []string{anycast4}
}
out, err := Render(in)
if err != nil {
return
}
// Determinism.
out2, err := Render(in)
if err != nil {
t.Fatalf("Render became flaky: first ok, second %v", err)
}
if out != out2 {
t.Fatalf("Render not deterministic on identical input")
}
// Smoke test for balanced braces. The template uses `{` and `}`
// as BIRD's block delimiters; if our template engine ever
// produced an unbalanced output we'd catch it here.
if got := strings.Count(out, "{") - strings.Count(out, "}"); got != 0 {
t.Fatalf("unbalanced braces: %d", got)
}
})
}
@@ -0,0 +1,11 @@
go test fuzz v1
string("0")
uint32(65101)
string("0")
uint32(1)
string("")
string("")
string("")
string("}")
string("")
string("")