Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 39ede9130b | |||
| 71e584cf96 |
+1
-1
@@ -21,7 +21,7 @@ RUN CGO_ENABLED=0 go build -trimpath \
|
||||
-o /out/flock-installer ./cmd/flock-installer
|
||||
|
||||
FROM alpine:3.21
|
||||
RUN apk add --no-cache iproute2 bird ca-certificates
|
||||
RUN apk add --no-cache iproute2 bird nftables ca-certificates
|
||||
COPY --from=build /out/flock /usr/local/bin/flock
|
||||
COPY --from=build /out/flock-agent /usr/local/bin/flock-agent
|
||||
COPY --from=build /out/flock-installer /usr/local/bin/flock-installer
|
||||
|
||||
@@ -1,22 +1,335 @@
|
||||
# flock
|
||||
|
||||
Kubernetes CNI for sjc001. Per-pod IPv4 opt-in, IID embedding, Ready-gated anycast via BGP.
|
||||
A small, opinionated Kubernetes CNI built around three ideas:
|
||||
|
||||
Design doc: `k8s-manager/dfritz-cni.md` (in the operator's k8s-manager repo).
|
||||
1. **IPv6-first.** Every pod gets a globally routable IPv6 address. IPv4 is
|
||||
per-pod opt-in for legacy clients.
|
||||
2. **No tunnels, no NAT.** Pod addresses are the real packets on the wire.
|
||||
Each node speaks BGP to its upstream router and advertises its own
|
||||
per-node prefix. The pod network is just the LAN, plus host routes.
|
||||
3. **Anycast as a primitive.** A pod can request an anycast address via
|
||||
an annotation; flock binds it on the pod's loopback and advertises a
|
||||
`/128` (or `/32`) over BGP, but only while the pod is `Ready`. Multiple
|
||||
replicas advertise the same address from different nodes for ECMP load
|
||||
balancing without a separate Service or external LB.
|
||||
|
||||
Status: M1 scaffold. Not functional. See milestones table in the design doc.
|
||||
flock is built for clusters where every node already speaks BGP to one
|
||||
or more upstream routers. It deliberately leaves out features you'd
|
||||
expect from a general-purpose CNI — overlays, IPsec/Wireguard, IPAM
|
||||
coordination across nodes, kube-proxy integration — so the moving parts
|
||||
that remain are easy to reason about.
|
||||
|
||||
## Layout
|
||||
> **Status:** alpha. CRD shape and annotation keys may still change.
|
||||
|
||||
- `cmd/flock` — CNI plugin binary (kubelet-invoked)
|
||||
- `cmd/flock-agent` — DaemonSet binary
|
||||
- `pkg/api/v1alpha1` — `NodeConfig` CRD types
|
||||
- `pkg/cni` — CNI plugin internals + RPC client
|
||||
- `pkg/agent` — agent server, IPAM, state file, anycast, NetworkPolicy
|
||||
- `pkg/embed` — `ip-algo` IID embedding (pure)
|
||||
- `pkg/routing/{bird,ospf}` — routing backends
|
||||
- `deploy/` — CRDs, RBAC, DaemonSet manifests
|
||||
## Table of contents
|
||||
|
||||
- [How it works](#how-it-works)
|
||||
- [Requirements](#requirements)
|
||||
- [Quickstart](#quickstart)
|
||||
- [NodeConfig CRD](#nodeconfig-crd)
|
||||
- [Pod annotations](#pod-annotations)
|
||||
- [Use cases](#use-cases)
|
||||
- [Comparison vs Calico / Cilium](#comparison-vs-calico--cilium)
|
||||
- [Limitations and non-goals](#limitations-and-non-goals)
|
||||
- [Building and testing](#building-and-testing)
|
||||
- [License](#license)
|
||||
|
||||
## How it works
|
||||
|
||||
Each node runs a single `flock-agent` DaemonSet pod with three containers:
|
||||
|
||||
- a privileged init container (`flock-installer`) that drops the CNI
|
||||
plugin binary into `/opt/cni/bin/flock` and writes
|
||||
`/etc/cni/net.d/01-flock.conflist`,
|
||||
- the agent itself, which owns IPAM, programs veth pairs, and tracks
|
||||
pod readiness, and
|
||||
- a [BIRD2](https://bird.network.cz/) sidecar that the agent re-renders
|
||||
and reloads when the per-node config or the active anycast set changes.
|
||||
|
||||
Each node has a `NodeConfig` CR (cluster-scoped, name = node name) that
|
||||
declares its IPv6 and IPv4 prefixes, its local BGP ASN, and its upstream
|
||||
peers. The agent reads the CR via a dynamic informer.
|
||||
|
||||
When kubelet runs the CNI plugin on `ADD`, the plugin opens a unix-socket
|
||||
RPC to the agent. The agent allocates an address from the per-node
|
||||
CIDRs, creates a veth pair, configures the pod side, persists the
|
||||
allocation to `/var/lib/flock/allocations.json`, and returns the result.
|
||||
There is no controller loop and no IPAM coordination across nodes — each
|
||||
node owns a non-overlapping CIDR and allocates locally.
|
||||
|
||||
For anycast, the agent installs `<anycast-ip> via <pod-eth0-ip> dev <veth>`
|
||||
host routes on the node and adds the anycast IP to BIRD's BGP export
|
||||
filter. When a pod loses readiness, the agent withdraws the route from
|
||||
both the kernel and BGP within one reconcile cycle (sub-second).
|
||||
|
||||
### Packet path
|
||||
|
||||
`pod.eth0` (a veth) ↔ host-side veth (with `addrgenmode none`,
|
||||
`fe80::1/64`, proxy-ARP for the v4 default-via) ↔ host kernel ↔ uplink
|
||||
NIC ↔ upstream router. No conntrack, no SNAT, no encapsulation.
|
||||
|
||||
For IPv6 the host side of every veth carries the deterministic link-local
|
||||
gateway `fe80::1`, so every pod can use a fixed default route. For IPv4
|
||||
the host side answers ARP for `169.254.1.1`, providing the same fixed
|
||||
default route in v4.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Linux nodes. flock has not been tested on, and does not target,
|
||||
Windows nodes.
|
||||
- Kubernetes ≥ 1.27.
|
||||
- An upstream router (or pair) that accepts a BGP session from each
|
||||
node. flock has been tested with Cisco IOS-XE, Arista EOS, and FRR
|
||||
acting as the upstream; anything that speaks standard eBGP should work.
|
||||
- Globally routable (or at least datacentre-routable) IPv6 prefix
|
||||
delegated to the cluster, sliced into a per-node /64. IPv4 is
|
||||
optional but supported.
|
||||
- Each node must have a unique local ASN. Private ASNs (`64512–65534`,
|
||||
`4200000000–4294967294`) are typical.
|
||||
|
||||
## Quickstart
|
||||
|
||||
```sh
|
||||
# 1. Install CRD + RBAC + DaemonSet (single bundled manifest):
|
||||
kubectl apply -f deploy/install.yaml
|
||||
|
||||
# 2. Label the node(s) you want flock to manage:
|
||||
kubectl label node <node-name> flock.fritzlab.net/agent=
|
||||
|
||||
# 3. Apply a NodeConfig CR for that node (see "NodeConfig CRD" below):
|
||||
kubectl apply -f my-nodeconfig.yaml
|
||||
|
||||
# 4. Verify the agent is up:
|
||||
kubectl -n kube-system get pod -l app=flock-agent -o wide
|
||||
kubectl -n kube-system exec -it ds/flock-agent -c bird -- \
|
||||
birdc -s /run/flock/bird.ctl show protocols
|
||||
```
|
||||
|
||||
The DaemonSet is gated by the `flock.fritzlab.net/agent` node label, so
|
||||
unlabelled nodes continue to use whatever CNI was installed before. This
|
||||
lets you migrate node-by-node — start with one node, prove it works, then
|
||||
proceed.
|
||||
|
||||
## NodeConfig CRD
|
||||
|
||||
A `NodeConfig` is the only operator-supplied input. One per node, name
|
||||
matches the node name. Example:
|
||||
|
||||
```yaml
|
||||
apiVersion: flock.fritzlab.net/v1alpha1
|
||||
kind: NodeConfig
|
||||
metadata:
|
||||
name: node-a
|
||||
spec:
|
||||
cidr6:
|
||||
- 2001:db8:f001::/64 # Pods on this node get addresses from here.
|
||||
cidr4:
|
||||
- 192.0.2.0/24 # IPv4 pool, used only when a pod opts in.
|
||||
defaults:
|
||||
ipv6: true # Optional. Built-in baseline if omitted.
|
||||
ipv4: false # Optional. Built-in baseline if omitted.
|
||||
bgp:
|
||||
asn: 65101 # This node's local ASN.
|
||||
peers:
|
||||
- address: 2001:db8::1 # Upstream router (IPv6 session).
|
||||
asn: 65000
|
||||
- address: 192.0.2.1 # Same router, IPv4 session.
|
||||
asn: 65000
|
||||
```
|
||||
|
||||
### `spec.defaults`
|
||||
|
||||
`spec.defaults` controls which address families a pod *gets by default*
|
||||
on this node — i.e. when the pod has no explicit `flock.fritzlab.net/ipv6`
|
||||
or `flock.fritzlab.net/ipv4` annotation. Pod annotations always override.
|
||||
If you omit `spec.defaults` (or any individual field inside it) flock
|
||||
falls back to its built-in baseline of **IPv6 on, IPv4 off**.
|
||||
|
||||
| Goal | `spec.defaults` |
|
||||
|---------------------------|----------------------------------------|
|
||||
| IPv6-only (the default) | omit, or `{ ipv6: true, ipv4: false }`|
|
||||
| Dual-stack by default | `{ ipv6: true, ipv4: true }` |
|
||||
| IPv4-only (legacy node) | `{ ipv6: false, ipv4: true }` |
|
||||
|
||||
A NodeConfig that resolves to "neither family" is rejected at allocation
|
||||
time, so misconfiguring both to false will surface as an error on the
|
||||
first `CNI ADD`.
|
||||
|
||||
### `spec.bgp`
|
||||
|
||||
Each `peer` becomes one BGP session. The agent picks a node-local source
|
||||
address on the same subnet as the peer; if there isn't one, BIRD uses
|
||||
its default. Multi-homing (multiple peers per family — or per upstream
|
||||
router pair) is allowed.
|
||||
|
||||
## Pod annotations
|
||||
|
||||
All annotations live under `flock.fritzlab.net/`. Every annotation is
|
||||
optional; leave them off to inherit the per-node defaults.
|
||||
|
||||
| Annotation | Type | Purpose |
|
||||
|-------------------------------------|--------|-----------------------------------------------------------------------------------------------|
|
||||
| `flock.fritzlab.net/ipv6` | bool | Override `spec.defaults.ipv6` for this pod (`true`/`false`). |
|
||||
| `flock.fritzlab.net/ipv4` | bool | Override `spec.defaults.ipv4` for this pod (`true`/`false`). |
|
||||
| `flock.fritzlab.net/cidr6` | CIDRs | Restrict IPv6 allocation to a sub-range of the node's `cidr6`. Comma-separated. |
|
||||
| `flock.fritzlab.net/cidr4` | CIDRs | Restrict IPv4 allocation to a sub-range of the node's `cidr4`. Comma-separated. |
|
||||
| `flock.fritzlab.net/ip-algo` | list | Embed identity into the IPv6 IID. Subset of `namespace,pod,image`, in order, comma-separated. |
|
||||
| `flock.fritzlab.net/anycast` | IPs | Bind these IPs on the pod's `lo`; advertise via BGP while pod is `Ready`. Mixed v6+v4 ok. |
|
||||
|
||||
Bool values must be the literal strings `"true"` or `"false"`
|
||||
(case-insensitive, surrounding whitespace tolerated). Other values —
|
||||
`1`, `0`, `yes`, `no` — are rejected so a typo can't silently flip
|
||||
behaviour.
|
||||
|
||||
### Example pods
|
||||
|
||||
Default IPv6-only — no annotations needed:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: minimal
|
||||
```
|
||||
|
||||
Dual-stack on a node whose default is IPv6-only:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: legacy-client
|
||||
annotations:
|
||||
flock.fritzlab.net/ipv4: "true"
|
||||
```
|
||||
|
||||
Operator-friendly addressing — `fnv(namespace) | fnv(pod) | random`
|
||||
packed into the host bits, so a pod's identity is recognisable from
|
||||
its IP in `kubectl get pods -o wide`:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
annotations:
|
||||
flock.fritzlab.net/ip-algo: "namespace,pod"
|
||||
```
|
||||
|
||||
Anycast service — three replicas, each advertising the same v6+v4
|
||||
anycast pair from the node it lands on. The upstream router does ECMP
|
||||
across the active set:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: dns
|
||||
spec:
|
||||
replicas: 3
|
||||
template:
|
||||
metadata:
|
||||
annotations:
|
||||
flock.fritzlab.net/ipv4: "true"
|
||||
flock.fritzlab.net/anycast: "2001:db8:a::53, 192.0.2.53"
|
||||
spec:
|
||||
containers:
|
||||
- name: coredns
|
||||
image: coredns/coredns
|
||||
readinessProbe:
|
||||
httpGet: { path: /ready, port: 8181 }
|
||||
periodSeconds: 1
|
||||
failureThreshold: 1
|
||||
```
|
||||
|
||||
## Use cases
|
||||
|
||||
**Highly-available DNS.** Run N CoreDNS replicas, each annotated with
|
||||
the same `anycast` IP. Point client `/etc/resolv.conf` at the anycast
|
||||
address. Each replica advertises a `/128` from its own node; the
|
||||
upstream router does ECMP. Lose a pod, traffic fails over within a
|
||||
probe cycle.
|
||||
|
||||
**Replacing a kube-proxy `ClusterIP`.** Headless Service plus an anycast
|
||||
IP gives you a single stable address with load-balancing across pods,
|
||||
without the DNAT-pinning that makes long-lived TCP keepalive connections
|
||||
stick to one backend forever. ECMP routes each new flow independently.
|
||||
|
||||
**Per-pod public IPv6.** Because every pod has a globally routable IPv6
|
||||
address and the cluster does no NAT, a pod's `eth0` IP is reachable from
|
||||
the rest of the internet (subject to your firewall). Useful for things
|
||||
like outgoing SMTP, where you want a stable from-address per pod, or for
|
||||
peer-to-peer protocols that don't tolerate NAT.
|
||||
|
||||
**Fast pod identification in `kubectl`.** With
|
||||
`flock.fritzlab.net/ip-algo: namespace,pod` the IPv6 host bits encode
|
||||
the pod's namespace+name, so you can recognise a pod from its IP without
|
||||
a lookup. Reverse-DNS via a wildcard zone makes those IPs human-readable
|
||||
too.
|
||||
|
||||
**Static-IP migration.** Annotation-driven address allocation means you
|
||||
can ask for a specific sub-CIDR (`cidr6: 2001:db8:f001::ab00/120`) for
|
||||
services that previously needed pinned IPs (mail server, ingress
|
||||
controller). When the static-IP requirement goes away, drop the
|
||||
annotation and the pod gets a normal allocation.
|
||||
|
||||
## Comparison vs Calico / Cilium
|
||||
|
||||
| | flock | Calico | Cilium |
|
||||
|--------------------------|-----------------------------|------------------------------|------------------------------|
|
||||
| Default address family | IPv6 | IPv4 | dual |
|
||||
| BGP | yes (BIRD) | yes | optional |
|
||||
| Overlay (VXLAN/IPIP) | never | optional | yes (geneve) or native |
|
||||
| NAT in datapath | never | masquerade by default | masquerade by default |
|
||||
| Anycast pod addressing | first-class | manual | optional, via service mesh |
|
||||
| eBPF datapath | no | optional | yes |
|
||||
| NetworkPolicy | not yet | yes (Felix) | yes (eBPF) |
|
||||
| Cluster size target | small (< 100 nodes) | thousands | thousands |
|
||||
| Operational surface area | low (1 DaemonSet, 1 CRD) | medium | high |
|
||||
| Production-ready | alpha | yes | yes |
|
||||
|
||||
flock is not trying to compete with Calico or Cilium. The right answer
|
||||
for most clusters is one of those two — flock exists for clusters where
|
||||
every node already speaks BGP, the operator wants to think in IPv6-first
|
||||
terms, and per-pod anycast is something they actually want to use rather
|
||||
than work around.
|
||||
|
||||
## Limitations and non-goals
|
||||
|
||||
- No NetworkPolicy enforcement yet (planned).
|
||||
- No NAT, no masquerade, no SNAT-egress. If your pods need to reach a
|
||||
legacy IPv4-only destination, give them an IPv4 address explicitly.
|
||||
- No multi-cluster, no peering across clusters.
|
||||
- Linux-only datapath.
|
||||
- IPAM is per-node — there's no global allocator and no IP mobility.
|
||||
When a pod moves to a different node it gets a new address.
|
||||
- The agent is privileged. It mounts `/var/run/netns`, configures veth
|
||||
pairs, manages kernel routes, and holds `CAP_NET_ADMIN`. This is
|
||||
inherent to being a CNI; reducing privilege further is not a goal.
|
||||
- If BIRD dies but the agent stays up, pods on that node stop being
|
||||
reachable from off-node. The DaemonSet liveness probes catch this.
|
||||
|
||||
## Building and testing
|
||||
|
||||
```sh
|
||||
# Unit tests + fuzz seed corpora (fast, ~1s):
|
||||
go test ./...
|
||||
|
||||
# Targeted fuzz pass:
|
||||
go test -run NEVERMATCH -fuzz=FuzzParseAnnotations -fuzztime=30s ./pkg/agent
|
||||
go test -run NEVERMATCH -fuzz=FuzzRender -fuzztime=30s ./pkg/routing/bird
|
||||
go test -run NEVERMATCH -fuzz=FuzzEmbed -fuzztime=30s ./pkg/embed
|
||||
go test -run NEVERMATCH -fuzz=FuzzIPAM_Allocate -fuzztime=30s ./pkg/agent
|
||||
|
||||
# Build the container image (used by the DaemonSet):
|
||||
docker build -t flock:dev .
|
||||
```
|
||||
|
||||
The fuzz tests are also run as plain unit tests via their seed corpora,
|
||||
so every `go test ./...` exercises the discovered edge cases as
|
||||
regressions.
|
||||
|
||||
`pkg/agent` has Linux-only files (`*_linux.go`) for netlink and netns
|
||||
work; the macOS/Windows build pulls in stubs from `*_stub.go` so tests
|
||||
run cleanly on developer laptops.
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0.
|
||||
Apache 2.0 — see [LICENSE](LICENSE).
|
||||
|
||||
@@ -20,6 +20,9 @@ spec:
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
required: [spec]
|
||||
description: |
|
||||
NodeConfig is the per-node operator-supplied configuration for the
|
||||
flock CNI agent. Its name MUST equal the Kubernetes node name.
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
@@ -35,6 +38,25 @@ spec:
|
||||
items:
|
||||
type: string
|
||||
description: IPv4 CIDR owned and aggregate-advertised by this node.
|
||||
defaults:
|
||||
type: object
|
||||
description: |
|
||||
Per-node baseline for which address families a pod receives
|
||||
when its own annotations don't specify. Pod annotations
|
||||
flock.fritzlab.net/ipv6 and flock.fritzlab.net/ipv4 always
|
||||
override these defaults. Built-in fallback (when this block
|
||||
or any field is omitted) is IPv6=true, IPv4=false.
|
||||
properties:
|
||||
ipv6:
|
||||
type: boolean
|
||||
description: |
|
||||
Default IPv6 inclusion for pods on this node. Omit to
|
||||
inherit the built-in baseline (true).
|
||||
ipv4:
|
||||
type: boolean
|
||||
description: |
|
||||
Default IPv4 inclusion for pods on this node. Omit to
|
||||
inherit the built-in baseline (false).
|
||||
bgp:
|
||||
type: object
|
||||
required: [asn, peers]
|
||||
@@ -70,3 +92,9 @@ spec:
|
||||
- name: CIDR4
|
||||
type: string
|
||||
jsonPath: .spec.cidr4
|
||||
- name: DefV6
|
||||
type: boolean
|
||||
jsonPath: .spec.defaults.ipv6
|
||||
- name: DefV4
|
||||
type: boolean
|
||||
jsonPath: .spec.defaults.ipv4
|
||||
|
||||
@@ -20,6 +20,9 @@ spec:
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
required: [spec]
|
||||
description: |
|
||||
NodeConfig is the per-node operator-supplied configuration for the
|
||||
flock CNI agent. Its name MUST equal the Kubernetes node name.
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
@@ -35,6 +38,25 @@ spec:
|
||||
items:
|
||||
type: string
|
||||
description: IPv4 CIDR owned and aggregate-advertised by this node.
|
||||
defaults:
|
||||
type: object
|
||||
description: |
|
||||
Per-node baseline for which address families a pod receives
|
||||
when its own annotations don't specify. Pod annotations
|
||||
flock.fritzlab.net/ipv6 and flock.fritzlab.net/ipv4 always
|
||||
override these defaults. Built-in fallback (when this block
|
||||
or any field is omitted) is IPv6=true, IPv4=false.
|
||||
properties:
|
||||
ipv6:
|
||||
type: boolean
|
||||
description: |
|
||||
Default IPv6 inclusion for pods on this node. Omit to
|
||||
inherit the built-in baseline (true).
|
||||
ipv4:
|
||||
type: boolean
|
||||
description: |
|
||||
Default IPv4 inclusion for pods on this node. Omit to
|
||||
inherit the built-in baseline (false).
|
||||
bgp:
|
||||
type: object
|
||||
required: [asn, peers]
|
||||
@@ -70,6 +92,12 @@ spec:
|
||||
- name: CIDR4
|
||||
type: string
|
||||
jsonPath: .spec.cidr4
|
||||
- name: DefV6
|
||||
type: boolean
|
||||
jsonPath: .spec.defaults.ipv6
|
||||
- name: DefV4
|
||||
type: boolean
|
||||
jsonPath: .spec.defaults.ipv4
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
|
||||
+175
-46
@@ -5,77 +5,153 @@ import (
|
||||
"net"
|
||||
"strings"
|
||||
|
||||
flockv1alpha1 "code.fritzlab.net/fritzlab/flock/pkg/api/v1alpha1"
|
||||
"code.fritzlab.net/fritzlab/flock/pkg/embed"
|
||||
)
|
||||
|
||||
// annotationPrefix is the namespace under which all flock pod annotations
|
||||
// live. Anything not starting with this prefix is ignored by the parser.
|
||||
const annotationPrefix = "flock.fritzlab.net/"
|
||||
|
||||
// ParsedAnnotations is the typed view of a Pod's flock annotations.
|
||||
type ParsedAnnotations struct {
|
||||
WantV6 bool
|
||||
WantV4 bool
|
||||
CIDR6 []*net.IPNet
|
||||
CIDR4 []*net.IPNet
|
||||
IPAlgo []embed.Field
|
||||
Anycast []net.IP
|
||||
// Recognised annotation keys (without the prefix).
|
||||
const (
|
||||
annIPv6 = "ipv6"
|
||||
annIPv4 = "ipv4"
|
||||
annCIDR6 = "cidr6"
|
||||
annCIDR4 = "cidr4"
|
||||
annIPAlgo = "ip-algo"
|
||||
annAnycast = "anycast"
|
||||
)
|
||||
|
||||
// FamilyDefaults is the per-call baseline for whether a pod receives an IPv6
|
||||
// and/or IPv4 address. It is the merge of:
|
||||
//
|
||||
// 1. flock's built-in baseline (IPv6=true, IPv4=false), then
|
||||
// 2. any NodeConfig.Spec.Defaults override the operator has applied to
|
||||
// the local node.
|
||||
//
|
||||
// Pod-level `flock.fritzlab.net/ipv{6,4}` annotations override this baseline.
|
||||
//
|
||||
// Use FamilyDefaultsFromNodeConfig to compute a value from a NodeConfig,
|
||||
// or BuiltinFamilyDefaults() if no NodeConfig is in scope.
|
||||
type FamilyDefaults struct {
|
||||
// WantV6 is the default-on value for IPv6 inclusion when the pod has no
|
||||
// explicit ipv6 annotation.
|
||||
WantV6 bool
|
||||
// WantV4 is the default-on value for IPv4 inclusion when the pod has no
|
||||
// explicit ipv4 annotation.
|
||||
WantV4 bool
|
||||
}
|
||||
|
||||
// ParseAnnotations applies the design-doc defaults (ipv6=true, ipv4=false)
|
||||
// and validates the post-merge combination.
|
||||
func ParseAnnotations(in map[string]string) (*ParsedAnnotations, error) {
|
||||
out := &ParsedAnnotations{WantV6: true, WantV4: false}
|
||||
// BuiltinFamilyDefaults returns flock's hard-coded fallback: IPv6 only.
|
||||
// This is the policy applied when no NodeConfig override is in effect.
|
||||
//
|
||||
// We define it as a function rather than a var so callers can't mutate the
|
||||
// shared baseline at runtime.
|
||||
func BuiltinFamilyDefaults() FamilyDefaults {
|
||||
return FamilyDefaults{WantV6: true, WantV4: false}
|
||||
}
|
||||
|
||||
if v, ok := in[annotationPrefix+"ipv6"]; ok {
|
||||
switch strings.ToLower(strings.TrimSpace(v)) {
|
||||
case "true":
|
||||
out.WantV6 = true
|
||||
case "false":
|
||||
out.WantV6 = false
|
||||
default:
|
||||
return nil, fmt.Errorf("annotation ipv6=%q: must be true or false", v)
|
||||
}
|
||||
// FamilyDefaultsFromNodeConfig resolves the effective per-node defaults,
|
||||
// falling back to BuiltinFamilyDefaults for any field the NodeConfig leaves
|
||||
// unset. A nil NodeConfig (or nil Spec.Defaults) returns the built-in
|
||||
// baseline unchanged.
|
||||
func FamilyDefaultsFromNodeConfig(nc *flockv1alpha1.NodeConfig) FamilyDefaults {
|
||||
out := BuiltinFamilyDefaults()
|
||||
if nc == nil || nc.Spec.Defaults == nil {
|
||||
return out
|
||||
}
|
||||
if v, ok := in[annotationPrefix+"ipv4"]; ok {
|
||||
switch strings.ToLower(strings.TrimSpace(v)) {
|
||||
case "true":
|
||||
out.WantV4 = true
|
||||
case "false":
|
||||
out.WantV4 = false
|
||||
default:
|
||||
return nil, fmt.Errorf("annotation ipv4=%q: must be true or false", v)
|
||||
if nc.Spec.Defaults.IPv6 != nil {
|
||||
out.WantV6 = *nc.Spec.Defaults.IPv6
|
||||
}
|
||||
if nc.Spec.Defaults.IPv4 != nil {
|
||||
out.WantV4 = *nc.Spec.Defaults.IPv4
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// ParsedAnnotations is the typed view of a pod's flock annotations after the
|
||||
// node-level defaults have been merged in. All slices are non-nil only when
|
||||
// the corresponding annotation was present and parsed cleanly.
|
||||
type ParsedAnnotations struct {
|
||||
// WantV6 is true when the pod should receive an IPv6 address.
|
||||
WantV6 bool
|
||||
// WantV4 is true when the pod should receive an IPv4 address.
|
||||
WantV4 bool
|
||||
// CIDR6 narrows IPv6 allocation to specific operator-approved sub-ranges
|
||||
// of the node's CIDR6 set. nil/empty means "use any node CIDR6".
|
||||
CIDR6 []*net.IPNet
|
||||
// CIDR4 narrows IPv4 allocation. nil/empty means "use any node CIDR4".
|
||||
CIDR4 []*net.IPNet
|
||||
// IPAlgo is the ordered list of identity fields used to build the IID.
|
||||
// nil/empty means "random IID".
|
||||
IPAlgo []embed.Field
|
||||
// Anycast is the set of anycast IPs to bind on the pod's loopback.
|
||||
// nil/empty means "no anycast".
|
||||
Anycast []net.IP
|
||||
}
|
||||
|
||||
// ParseAnnotations applies the supplied per-node defaults and validates the
|
||||
// post-merge combination. It is pure — it does not consult NodeConfig or any
|
||||
// global state — so it is safe to call from tests and fuzz targets.
|
||||
//
|
||||
// Annotation precedence: pod annotation > FamilyDefaults > built-in baseline.
|
||||
// Callers compute FamilyDefaults via FamilyDefaultsFromNodeConfig and pass it
|
||||
// in.
|
||||
//
|
||||
// Errors:
|
||||
// - any unknown ipv6/ipv4 value (must be "true" or "false", case-insensitive)
|
||||
// - any malformed cidr6/cidr4/anycast/ip-algo value
|
||||
// - the post-merge combination resolves to neither IPv6 nor IPv4 (a pod
|
||||
// must have at least one address)
|
||||
func ParseAnnotations(in map[string]string, defaults FamilyDefaults) (*ParsedAnnotations, error) {
|
||||
out := &ParsedAnnotations{WantV6: defaults.WantV6, WantV4: defaults.WantV4}
|
||||
|
||||
if v, ok := in[annotationPrefix+annIPv6]; ok {
|
||||
b, err := parseBoolAnnotation(annIPv6, v)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
out.WantV6 = b
|
||||
}
|
||||
if v, ok := in[annotationPrefix+annIPv4]; ok {
|
||||
b, err := parseBoolAnnotation(annIPv4, v)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
out.WantV4 = b
|
||||
}
|
||||
if !out.WantV6 && !out.WantV4 {
|
||||
return nil, fmt.Errorf("ipv6=false requires ipv4=true (pod must have at least one address)")
|
||||
return nil, fmt.Errorf("annotations + defaults resolve to no address family (need at least one of ipv6/ipv4)")
|
||||
}
|
||||
|
||||
if v, ok := in[annotationPrefix+"cidr6"]; ok {
|
||||
nets, err := parseCIDRList(v)
|
||||
if v, ok := in[annotationPrefix+annCIDR6]; ok {
|
||||
nets, err := parseCIDRList(v, familyV6)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("annotation cidr6: %w", err)
|
||||
return nil, fmt.Errorf("annotation %s: %w", annCIDR6, err)
|
||||
}
|
||||
out.CIDR6 = nets
|
||||
}
|
||||
if v, ok := in[annotationPrefix+"cidr4"]; ok {
|
||||
nets, err := parseCIDRList(v)
|
||||
if v, ok := in[annotationPrefix+annCIDR4]; ok {
|
||||
nets, err := parseCIDRList(v, familyV4)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("annotation cidr4: %w", err)
|
||||
return nil, fmt.Errorf("annotation %s: %w", annCIDR4, err)
|
||||
}
|
||||
out.CIDR4 = nets
|
||||
}
|
||||
|
||||
if v, ok := in[annotationPrefix+"ip-algo"]; ok {
|
||||
if v, ok := in[annotationPrefix+annIPAlgo]; ok {
|
||||
fields, err := parseIPAlgo(v)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("annotation ip-algo: %w", err)
|
||||
return nil, fmt.Errorf("annotation %s: %w", annIPAlgo, err)
|
||||
}
|
||||
out.IPAlgo = fields
|
||||
}
|
||||
|
||||
if v, ok := in[annotationPrefix+"anycast"]; ok {
|
||||
if v, ok := in[annotationPrefix+annAnycast]; ok {
|
||||
ips, err := parseIPList(v)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("annotation anycast: %w", err)
|
||||
return nil, fmt.Errorf("annotation %s: %w", annAnycast, err)
|
||||
}
|
||||
out.Anycast = ips
|
||||
}
|
||||
@@ -83,7 +159,39 @@ func ParseAnnotations(in map[string]string) (*ParsedAnnotations, error) {
|
||||
return out, nil
|
||||
}
|
||||
|
||||
func parseCIDRList(s string) ([]*net.IPNet, error) {
|
||||
// parseBoolAnnotation accepts only "true" or "false" (case-insensitive,
|
||||
// surrounding whitespace tolerated). All other values — including "1", "0",
|
||||
// "yes", "no" — are rejected so operator typos are caught loudly rather
|
||||
// than silently producing the "false" default.
|
||||
func parseBoolAnnotation(key, v string) (bool, error) {
|
||||
switch strings.ToLower(strings.TrimSpace(v)) {
|
||||
case "true":
|
||||
return true, nil
|
||||
case "false":
|
||||
return false, nil
|
||||
default:
|
||||
return false, fmt.Errorf("annotation %s=%q: must be \"true\" or \"false\"", key, v)
|
||||
}
|
||||
}
|
||||
|
||||
// addressFamily distinguishes IPv6 vs IPv4 in places where the parser must
|
||||
// validate the family of supplied CIDRs.
|
||||
type addressFamily int
|
||||
|
||||
const (
|
||||
familyAny addressFamily = iota
|
||||
familyV6
|
||||
familyV4
|
||||
)
|
||||
|
||||
// parseCIDRList parses a comma-separated CIDR list. Whitespace around items
|
||||
// is trimmed; empty items are silently dropped. The list must contain at
|
||||
// least one entry post-trim.
|
||||
//
|
||||
// If `want` is familyV6 or familyV4 each entry's family is checked and a
|
||||
// mismatch is reported, so an `flock.fritzlab.net/cidr6` annotation cannot
|
||||
// silently slip a v4 prefix into the v6 allocator.
|
||||
func parseCIDRList(s string, want addressFamily) ([]*net.IPNet, error) {
|
||||
var out []*net.IPNet
|
||||
for _, part := range strings.Split(s, ",") {
|
||||
part = strings.TrimSpace(part)
|
||||
@@ -94,6 +202,17 @@ func parseCIDRList(s string) ([]*net.IPNet, error) {
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("invalid CIDR %q: %w", part, err)
|
||||
}
|
||||
isV4 := n.IP.To4() != nil
|
||||
switch want {
|
||||
case familyV6:
|
||||
if isV4 {
|
||||
return nil, fmt.Errorf("CIDR %q is IPv4, expected IPv6", part)
|
||||
}
|
||||
case familyV4:
|
||||
if !isV4 {
|
||||
return nil, fmt.Errorf("CIDR %q is IPv6, expected IPv4", part)
|
||||
}
|
||||
}
|
||||
out = append(out, n)
|
||||
}
|
||||
if len(out) == 0 {
|
||||
@@ -102,6 +221,9 @@ func parseCIDRList(s string) ([]*net.IPNet, error) {
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// parseIPList parses a comma-separated literal-IP list. Same trim/empty
|
||||
// semantics as parseCIDRList. Mixed v4 and v6 entries are allowed (anycast
|
||||
// pods can advertise both families together).
|
||||
func parseIPList(s string) ([]net.IP, error) {
|
||||
var out []net.IP
|
||||
for _, part := range strings.Split(s, ",") {
|
||||
@@ -121,6 +243,9 @@ func parseIPList(s string) ([]net.IP, error) {
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// parseIPAlgo parses the ip-algo annotation. Each comma-separated token must
|
||||
// match one of: namespace, pod, image. Empty tokens are dropped; unknown
|
||||
// tokens are reported.
|
||||
func parseIPAlgo(s string) ([]embed.Field, error) {
|
||||
var out []embed.Field
|
||||
for _, part := range strings.Split(s, ",") {
|
||||
@@ -128,11 +253,11 @@ func parseIPAlgo(s string) ([]embed.Field, error) {
|
||||
switch part {
|
||||
case "":
|
||||
continue
|
||||
case "namespace":
|
||||
case string(embed.FieldNamespace):
|
||||
out = append(out, embed.FieldNamespace)
|
||||
case "pod":
|
||||
case string(embed.FieldPod):
|
||||
out = append(out, embed.FieldPod)
|
||||
case "image":
|
||||
case string(embed.FieldImage):
|
||||
out = append(out, embed.FieldImage)
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown ip-algo field %q (allowed: namespace, pod, image)", part)
|
||||
@@ -144,8 +269,8 @@ func parseIPAlgo(s string) ([]embed.Field, error) {
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// CNIArgs parses the K=V;K=V CNI_ARGS string for the kubelet keys we care
|
||||
// about. Other keys are ignored.
|
||||
// CNIArgs is the typed view of the K=V;K=V CNI_ARGS string passed by kubelet.
|
||||
// We only keep the fields the agent uses; unknown keys are ignored.
|
||||
type CNIArgs struct {
|
||||
PodNamespace string
|
||||
PodName string
|
||||
@@ -153,6 +278,10 @@ type CNIArgs struct {
|
||||
InfraID string
|
||||
}
|
||||
|
||||
// ParseCNIArgs is permissive by design — kubelet versions and runtime
|
||||
// shims pass varying sets of keys. Malformed entries are skipped silently
|
||||
// rather than failing the whole ADD; required-key validation is the
|
||||
// caller's responsibility.
|
||||
func ParseCNIArgs(s string) CNIArgs {
|
||||
var a CNIArgs
|
||||
for _, kv := range strings.Split(s, ";") {
|
||||
|
||||
@@ -0,0 +1,156 @@
|
||||
package agent
|
||||
|
||||
import (
|
||||
"testing"
|
||||
)
|
||||
|
||||
// FuzzParseAnnotations explores the joint space of {ipv6, ipv4, cidr6, cidr4,
|
||||
// ip-algo, anycast} annotations with random byte strings. Every recognised
|
||||
// key is exercised by deriving a deterministic input map from the fuzzed
|
||||
// bytes; this gives the fuzzer reach into all parser branches at once.
|
||||
//
|
||||
// Properties checked:
|
||||
//
|
||||
// 1. The parser never panics on any input.
|
||||
// 2. On nil-error return, the result satisfies the design-doc invariant
|
||||
// that at least one of WantV6 / WantV4 is true (a pod always has at
|
||||
// least one address).
|
||||
// 3. Anycast IPs and IPAlgo fields are non-nil/empty only when the
|
||||
// annotation was supplied; never spontaneously populated.
|
||||
//
|
||||
// Seed corpus covers known edge cases the spec must handle.
|
||||
func FuzzParseAnnotations(f *testing.F) {
|
||||
// Seeds: each entry is six strings — the literal raw values for the
|
||||
// six parsed keys. Empty string for "key absent".
|
||||
type seed struct {
|
||||
ipv6, ipv4, cidr6, cidr4, ipAlgo, anycast string
|
||||
}
|
||||
seeds := []seed{
|
||||
{},
|
||||
{ipv4: "true"},
|
||||
{ipv6: "false", ipv4: "true"},
|
||||
{ipv6: "TRUE"},
|
||||
{ipv6: " true "},
|
||||
{ipv6: "yes"}, // invalid → expect error
|
||||
{ipv4: "1"}, // invalid
|
||||
{cidr6: ""}, // invalid (empty after split)
|
||||
{cidr6: ","}, // invalid (empty after trim)
|
||||
{cidr6: "2602:817:3000:f001::/64"}, // valid single
|
||||
{cidr6: "2602:817:3000:f001::/64,"}, // trailing comma
|
||||
{cidr6: " 2602:817:3000:f001::/64 "}, // surrounding whitespace
|
||||
{cidr6: "2602:817:3000:f001::/64, 2602:817:3000:f002::/64"},
|
||||
{cidr6: "10.0.0.0/8"}, // family mismatch
|
||||
{cidr4: "172.25.210.0/24"}, // valid
|
||||
{cidr4: "172.25.210.0/24,172.25.211.0/24"}, // multiple
|
||||
{cidr4: "2602:817::/32"}, // family mismatch
|
||||
{ipAlgo: "namespace,pod,image"},
|
||||
{ipAlgo: "namespace, pod , image"}, // whitespace
|
||||
{ipAlgo: "namespace,unknown"}, // invalid
|
||||
{ipAlgo: ""}, // invalid (empty)
|
||||
{ipAlgo: ","}, // invalid
|
||||
{anycast: "2602:817:3000:ac::1"},
|
||||
{anycast: "2602:817:3000:ac::1, 172.25.255.1"},
|
||||
{anycast: "::1"}, // loopback (allowed at parse time)
|
||||
{anycast: "fe80::1"}, // link-local (allowed at parse time)
|
||||
{anycast: "::ffff:10.0.0.1"}, // v4-mapped v6
|
||||
{anycast: "0.0.0.0"}, // unspecified
|
||||
{anycast: "definitely-not-an-ip"}, // invalid
|
||||
{anycast: ""}, // invalid
|
||||
// Embedded NUL bytes
|
||||
{ipv4: "true\x00"},
|
||||
{cidr6: "2602:817:3000:f001::/64\x00"},
|
||||
{anycast: "\x00\x00"},
|
||||
// Unicode
|
||||
{ipv4: "trüe"},
|
||||
{ipAlgo: "námespace"},
|
||||
// Very long
|
||||
{cidr6: longString("2602:817:3000:f001::/64,", 4096)},
|
||||
}
|
||||
for _, s := range seeds {
|
||||
f.Add(s.ipv6, s.ipv4, s.cidr6, s.cidr4, s.ipAlgo, s.anycast)
|
||||
}
|
||||
|
||||
f.Fuzz(func(t *testing.T, ipv6, ipv4, cidr6, cidr4, ipAlgo, anycast string) {
|
||||
in := map[string]string{}
|
||||
// Treat empty as "key absent" so the seed table matches the run-time
|
||||
// shape; Kubernetes annotations cannot have a nil value but they CAN
|
||||
// be missing entirely. Empty-string-with-key is also a real case
|
||||
// (operator typo); add a separate seed below to cover it.
|
||||
if ipv6 != "" {
|
||||
in[annotationPrefix+annIPv6] = ipv6
|
||||
}
|
||||
if ipv4 != "" {
|
||||
in[annotationPrefix+annIPv4] = ipv4
|
||||
}
|
||||
if cidr6 != "" {
|
||||
in[annotationPrefix+annCIDR6] = cidr6
|
||||
}
|
||||
if cidr4 != "" {
|
||||
in[annotationPrefix+annCIDR4] = cidr4
|
||||
}
|
||||
if ipAlgo != "" {
|
||||
in[annotationPrefix+annIPAlgo] = ipAlgo
|
||||
}
|
||||
if anycast != "" {
|
||||
in[annotationPrefix+annAnycast] = anycast
|
||||
}
|
||||
|
||||
got, err := ParseAnnotations(in, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
return // any error is acceptable; we only require no panic
|
||||
}
|
||||
// Property: at least one family must be selected.
|
||||
if !got.WantV6 && !got.WantV4 {
|
||||
t.Fatalf("parser accepted but produced no family: in=%#v", in)
|
||||
}
|
||||
// Property: optional fields populated only when their key was set.
|
||||
if _, hasAlgo := in[annotationPrefix+annIPAlgo]; !hasAlgo && len(got.IPAlgo) != 0 {
|
||||
t.Fatalf("IPAlgo populated without annotation")
|
||||
}
|
||||
if _, hasAny := in[annotationPrefix+annAnycast]; !hasAny && len(got.Anycast) != 0 {
|
||||
t.Fatalf("Anycast populated without annotation")
|
||||
}
|
||||
if _, hasC6 := in[annotationPrefix+annCIDR6]; !hasC6 && len(got.CIDR6) != 0 {
|
||||
t.Fatalf("CIDR6 populated without annotation")
|
||||
}
|
||||
if _, hasC4 := in[annotationPrefix+annCIDR4]; !hasC4 && len(got.CIDR4) != 0 {
|
||||
t.Fatalf("CIDR4 populated without annotation")
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// FuzzParseCNIArgs requires the parser to never panic on adversarial inputs.
|
||||
// The parser is permissive by spec — it returns a CNIArgs with whatever it
|
||||
// could extract — so the only invariant is "doesn't crash".
|
||||
func FuzzParseCNIArgs(f *testing.F) {
|
||||
f.Add("")
|
||||
f.Add("=")
|
||||
f.Add(";")
|
||||
f.Add(";=;=;")
|
||||
f.Add("K8S_POD_NAMESPACE=ns;K8S_POD_NAME=p")
|
||||
f.Add("K8S_POD_NAMESPACE=ns;K8S_POD_NAME=p;K8S_POD_UID=abc;K8S_POD_INFRA_CONTAINER_ID=def")
|
||||
f.Add("=value-only")
|
||||
f.Add("key-only=")
|
||||
f.Add("\x00\x00\x00")
|
||||
f.Add("K8S_POD_NAMESPACE=\xff\xfe\xfd")
|
||||
f.Add("K8S_POD_NAME=value;K8S_POD_NAME=other") // duplicate keys: last wins
|
||||
// Long input
|
||||
f.Add(longString("K8S_POD_NAME=x;", 4096))
|
||||
|
||||
f.Fuzz(func(t *testing.T, in string) {
|
||||
_ = ParseCNIArgs(in)
|
||||
})
|
||||
}
|
||||
|
||||
// longString returns s repeated to total >= n bytes, useful for piling up
|
||||
// realistic-looking but oversized inputs.
|
||||
func longString(s string, n int) string {
|
||||
if len(s) == 0 {
|
||||
return ""
|
||||
}
|
||||
var b []byte
|
||||
for len(b) < n {
|
||||
b = append(b, s...)
|
||||
}
|
||||
return string(b)
|
||||
}
|
||||
@@ -3,11 +3,68 @@ package agent
|
||||
import (
|
||||
"testing"
|
||||
|
||||
flockv1alpha1 "code.fritzlab.net/fritzlab/flock/pkg/api/v1alpha1"
|
||||
"code.fritzlab.net/fritzlab/flock/pkg/embed"
|
||||
)
|
||||
|
||||
func TestParseAnnotations_Defaults(t *testing.T) {
|
||||
a, err := ParseAnnotations(nil)
|
||||
// boolPtr returns a pointer to b — convenient for the *bool pointer fields
|
||||
// in FamilyDefaults where nil means "unset".
|
||||
func boolPtr(b bool) *bool { return &b }
|
||||
|
||||
func TestBuiltinFamilyDefaults(t *testing.T) {
|
||||
d := BuiltinFamilyDefaults()
|
||||
if !d.WantV6 || d.WantV4 {
|
||||
t.Fatalf("built-in defaults wrong: v6=%v v4=%v (want true/false)", d.WantV6, d.WantV4)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFamilyDefaultsFromNodeConfig_NilNodeConfig(t *testing.T) {
|
||||
d := FamilyDefaultsFromNodeConfig(nil)
|
||||
if d != BuiltinFamilyDefaults() {
|
||||
t.Fatalf("nil NodeConfig should yield built-in defaults; got %+v", d)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFamilyDefaultsFromNodeConfig_NilDefaults(t *testing.T) {
|
||||
nc := &flockv1alpha1.NodeConfig{}
|
||||
d := FamilyDefaultsFromNodeConfig(nc)
|
||||
if d != BuiltinFamilyDefaults() {
|
||||
t.Fatalf("missing Defaults should yield built-in; got %+v", d)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFamilyDefaultsFromNodeConfig_PartialOverride(t *testing.T) {
|
||||
nc := &flockv1alpha1.NodeConfig{
|
||||
Spec: flockv1alpha1.NodeConfigSpec{
|
||||
Defaults: &flockv1alpha1.FamilyDefaults{
|
||||
IPv4: boolPtr(true),
|
||||
},
|
||||
},
|
||||
}
|
||||
d := FamilyDefaultsFromNodeConfig(nc)
|
||||
// IPv6 was unset → keeps built-in true; IPv4 was set → flipped on.
|
||||
if !d.WantV6 || !d.WantV4 {
|
||||
t.Fatalf("partial override wrong: %+v (want v6=true, v4=true)", d)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFamilyDefaultsFromNodeConfig_FullOverride(t *testing.T) {
|
||||
nc := &flockv1alpha1.NodeConfig{
|
||||
Spec: flockv1alpha1.NodeConfigSpec{
|
||||
Defaults: &flockv1alpha1.FamilyDefaults{
|
||||
IPv6: boolPtr(false),
|
||||
IPv4: boolPtr(true),
|
||||
},
|
||||
},
|
||||
}
|
||||
d := FamilyDefaultsFromNodeConfig(nc)
|
||||
if d.WantV6 || !d.WantV4 {
|
||||
t.Fatalf("full override wrong: %+v (want v6=false, v4=true)", d)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_BuiltinDefaults(t *testing.T) {
|
||||
a, err := ParseAnnotations(nil, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
@@ -16,10 +73,36 @@ func TestParseAnnotations_Defaults(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_DualStack(t *testing.T) {
|
||||
func TestParseAnnotations_NodeDefaultsApplied(t *testing.T) {
|
||||
// Node config says "IPv4 is on by default for this node".
|
||||
d := FamilyDefaults{WantV6: true, WantV4: true}
|
||||
a, err := ParseAnnotations(nil, d)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if !a.WantV6 || !a.WantV4 {
|
||||
t.Fatalf("node defaults not applied: %+v", a)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_AnnotationOverridesNodeDefault(t *testing.T) {
|
||||
// Node says dual-stack by default; pod opts out of v4 explicitly.
|
||||
d := FamilyDefaults{WantV6: true, WantV4: true}
|
||||
a, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ipv4": "false",
|
||||
}, d)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if !a.WantV6 || a.WantV4 {
|
||||
t.Fatalf("annotation override failed: %+v", a)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_DualStackViaAnnotation(t *testing.T) {
|
||||
a, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ipv4": "true",
|
||||
})
|
||||
}, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
@@ -31,15 +114,49 @@ func TestParseAnnotations_DualStack(t *testing.T) {
|
||||
func TestParseAnnotations_NoFamily(t *testing.T) {
|
||||
if _, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ipv6": "false",
|
||||
}); err == nil {
|
||||
}, BuiltinFamilyDefaults()); err == nil {
|
||||
t.Fatalf("expected error: ipv6=false ipv4=false")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_NoFamily_NodeDefaultsAlsoOff(t *testing.T) {
|
||||
// Pathological NodeConfig that disables both families. Even with no pod
|
||||
// annotation we must reject — otherwise a pod gets an empty allocation.
|
||||
d := FamilyDefaults{WantV6: false, WantV4: false}
|
||||
if _, err := ParseAnnotations(nil, d); err == nil {
|
||||
t.Fatalf("expected error when both defaults are false")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_BoolStrictness(t *testing.T) {
|
||||
// Common misuses that should be rejected so typos don't silently flip
|
||||
// behaviour to the implicit-false default.
|
||||
bad := []string{"1", "0", "yes", "no", "TrueFalse", " "}
|
||||
for _, v := range bad {
|
||||
_, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ipv4": v,
|
||||
}, BuiltinFamilyDefaults())
|
||||
if err == nil {
|
||||
t.Errorf("expected error for ipv4=%q", v)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_BoolCaseInsensitive(t *testing.T) {
|
||||
for _, v := range []string{"TRUE", "True", " true ", "FALSE", "False"} {
|
||||
_, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ipv4": v,
|
||||
}, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
t.Errorf("expected ipv4=%q to parse cleanly: %v", v, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_IPAlgo(t *testing.T) {
|
||||
a, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ip-algo": "namespace,pod,image",
|
||||
})
|
||||
}, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
@@ -54,10 +171,18 @@ func TestParseAnnotations_IPAlgo(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_IPAlgo_Unknown(t *testing.T) {
|
||||
if _, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "ip-algo": "namespace,foo",
|
||||
}, BuiltinFamilyDefaults()); err == nil {
|
||||
t.Fatalf("expected unknown-field error")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_CIDR(t *testing.T) {
|
||||
a, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "cidr6": "2602:817:3000:f001::/64, 2602:817:3000:f002::/64",
|
||||
})
|
||||
}, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
@@ -66,9 +191,49 @@ func TestParseAnnotations_CIDR(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_CIDR_FamilyMismatch(t *testing.T) {
|
||||
// v4 prefix in a cidr6 annotation must not silently slip through.
|
||||
if _, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "cidr6": "10.0.0.0/8",
|
||||
}, BuiltinFamilyDefaults()); err == nil {
|
||||
t.Fatalf("expected family mismatch error")
|
||||
}
|
||||
if _, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "cidr4": "2602:817::/32",
|
||||
}, BuiltinFamilyDefaults()); err == nil {
|
||||
t.Fatalf("expected family mismatch error")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseAnnotations_Anycast_Mixed(t *testing.T) {
|
||||
// Anycast accepts both families together — typical for a service that
|
||||
// advertises one v6 and one v4 anycast IP.
|
||||
a, err := ParseAnnotations(map[string]string{
|
||||
annotationPrefix + "anycast": "2602:817:3000:ac::1, 172.25.255.1",
|
||||
}, BuiltinFamilyDefaults())
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(a.Anycast) != 2 {
|
||||
t.Fatalf("anycast len=%d", len(a.Anycast))
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseCNIArgs(t *testing.T) {
|
||||
args := ParseCNIArgs("IgnoreUnknown=1;K8S_POD_NAMESPACE=mail;K8S_POD_NAME=stalwart-0;K8S_POD_INFRA_CONTAINER_ID=abc123")
|
||||
if args.PodNamespace != "mail" || args.PodName != "stalwart-0" || args.InfraID != "abc123" {
|
||||
t.Fatalf("ParseCNIArgs got %+v", args)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseCNIArgs_EmptyAndMalformed(t *testing.T) {
|
||||
// Permissive: malformed entries are skipped, never crash.
|
||||
a := ParseCNIArgs("")
|
||||
if a.PodName != "" {
|
||||
t.Fatalf("empty input should yield empty CNIArgs, got %+v", a)
|
||||
}
|
||||
a = ParseCNIArgs(";;K8S_POD_NAMESPACE=ns;noequalshere;=novalue;K8S_POD_NAME=p")
|
||||
if a.PodNamespace != "ns" || a.PodName != "p" {
|
||||
t.Fatalf("permissive parse failed: %+v", a)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,22 @@
|
||||
// Package agent owns the in-process flock-agent runtime. The agent is a
|
||||
// single Linux DaemonSet pod per node and holds:
|
||||
//
|
||||
// - the durable per-node allocation file at /var/lib/flock/allocations.json
|
||||
// (see Store in state.go),
|
||||
// - an in-memory IPAM seeded from NodeConfig CIDRs and reconciled against
|
||||
// the allocation file at startup (see ipam.go),
|
||||
// - dynamic informers watching the per-node NodeConfig CR (nodeconfig.go)
|
||||
// and the local-node Pod set (podinfo.go),
|
||||
// - an RPC server speaking to the lightweight CNI plugin binary
|
||||
// (cmd/flock and pkg/cni), so kubelet's CNI invocations are answered by
|
||||
// a long-lived process rather than spinning up a fresh binary per ADD,
|
||||
// - the BirdManager that renders bird.conf and triggers `birdc reload`
|
||||
// on changes (bird.go), and
|
||||
// - the AnycastReconciler that programs per-pod /128 and /32 host routes
|
||||
// gated on Pod readiness (anycast_linux.go).
|
||||
//
|
||||
// The package is split between platform-specific files (anycast_linux.go,
|
||||
// netns_linux.go, runtime_linux.go) and stub files used on non-Linux build
|
||||
// hosts so the rest of the package — IPAM, parsing, store, RPC plumbing —
|
||||
// stays unit-testable on macOS and Windows CI.
|
||||
package agent
|
||||
@@ -49,7 +49,8 @@ func (h *PodHandler) Add(ctx context.Context, req flockcni.Request) (*current.Re
|
||||
return nil, fmt.Errorf("lookup pod: %w", err)
|
||||
}
|
||||
|
||||
parsed, err := ParseAnnotations(pod.Annotations)
|
||||
defaults := FamilyDefaultsFromNodeConfig(h.NodeConfig.Load())
|
||||
parsed, err := ParseAnnotations(pod.Annotations, defaults)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("parse annotations: %w", err)
|
||||
}
|
||||
|
||||
@@ -0,0 +1,63 @@
|
||||
package agent
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestHostIfaceName_Format(t *testing.T) {
|
||||
got := HostIfaceName("0123456789abcdef0123456789abcdef")
|
||||
if !strings.HasPrefix(got, "flock") || len(got) != len("flock")+8 {
|
||||
t.Fatalf("HostIfaceName=%q (want flock + 8 hex)", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestHostIfaceName_Determinism(t *testing.T) {
|
||||
a := HostIfaceName("container-xyz")
|
||||
b := HostIfaceName("container-xyz")
|
||||
if a != b {
|
||||
t.Fatalf("not deterministic: %s vs %s", a, b)
|
||||
}
|
||||
}
|
||||
|
||||
func TestHostIfaceName_DifferentInputs(t *testing.T) {
|
||||
a := HostIfaceName("a")
|
||||
b := HostIfaceName("b")
|
||||
if a == b {
|
||||
t.Fatalf("collision on trivial inputs")
|
||||
}
|
||||
}
|
||||
|
||||
// FuzzHostIfaceName ensures the host interface name generator never produces
|
||||
// an output longer than IFNAMSIZ-1 (15 chars on Linux) and never panics.
|
||||
// The name format is "flock" + 8 hex chars = 13 chars, always.
|
||||
func FuzzHostIfaceName(f *testing.F) {
|
||||
f.Add("")
|
||||
f.Add("a")
|
||||
f.Add("/var/run/netns/abc")
|
||||
f.Add("0123456789abcdef0123456789abcdef")
|
||||
f.Add(longString("x", 64*1024)) // very long containerID
|
||||
f.Add("\x00\x00\x00")
|
||||
f.Add("ünïcødé/контейнер")
|
||||
|
||||
f.Fuzz(func(t *testing.T, id string) {
|
||||
got := HostIfaceName(id)
|
||||
// Linux IFNAMSIZ is 16 (15 chars + NUL); ours must fit comfortably.
|
||||
if len(got) > 15 {
|
||||
t.Fatalf("HostIfaceName(%q)=%q exceeds 15 chars", id, got)
|
||||
}
|
||||
if !strings.HasPrefix(got, "flock") {
|
||||
t.Fatalf("HostIfaceName(%q)=%q missing prefix", id, got)
|
||||
}
|
||||
// Suffix must be lowercase hex (8 chars).
|
||||
suffix := got[len("flock"):]
|
||||
if len(suffix) != 8 {
|
||||
t.Fatalf("HostIfaceName(%q) suffix len=%d", id, len(suffix))
|
||||
}
|
||||
for _, c := range suffix {
|
||||
if !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f')) {
|
||||
t.Fatalf("HostIfaceName(%q)=%q has non-hex suffix", id, got)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
+43
-20
@@ -62,13 +62,15 @@ func (cryptoRand) PickIndex(n int) int {
|
||||
}
|
||||
|
||||
// AllocRequest describes a pending allocation. Values come from Pod metadata
|
||||
// + annotations at CNI ADD time.
|
||||
// + annotations at CNI ADD time, with per-node FamilyDefaults already merged
|
||||
// in (see ParseAnnotations).
|
||||
type AllocRequest struct {
|
||||
ContainerID string
|
||||
Namespace string
|
||||
Pod string
|
||||
// WantV6 / WantV4 come from the ipv6 / ipv4 annotations (defaults in
|
||||
// design doc: ipv6=true, ipv4=false).
|
||||
// WantV6 / WantV4 are the post-merge address family selection (pod
|
||||
// annotation > NodeConfig.Spec.Defaults > built-in baseline). At least
|
||||
// one MUST be true; Allocate rejects the request otherwise.
|
||||
WantV6 bool
|
||||
WantV4 bool
|
||||
// AnnCIDR6 / AnnCIDR4 come from the cidr6 / cidr4 annotations. Empty
|
||||
@@ -224,34 +226,36 @@ func (i *IPAM) allocV6(cidr *net.IPNet, req AllocRequest) (net.IP, error) {
|
||||
|
||||
// randomV6 picks a random /128 inside cidr. The network prefix bits are
|
||||
// preserved from cidr.IP; the host bits are filled from the random source.
|
||||
//
|
||||
// Implementation: walk the 16 IPv6 bytes once. For each byte we ask whether
|
||||
// it's entirely inside the network mask (skip), entirely inside the host
|
||||
// portion (overwrite with random), or split (combine bits from both).
|
||||
func (i *IPAM) randomV6(cidr *net.IPNet) (net.IP, error) {
|
||||
ones, bits := cidr.Mask.Size()
|
||||
if bits != 128 {
|
||||
return nil, fmt.Errorf("cidr %s is not IPv6", cidr)
|
||||
}
|
||||
out := make(net.IP, 16)
|
||||
out := make(net.IP, net.IPv6len)
|
||||
copy(out, cidr.IP.To16())
|
||||
hostBits := 128 - ones
|
||||
rnd := make([]byte, 16)
|
||||
rnd := make([]byte, net.IPv6len)
|
||||
i.randSrc.FillIID(rnd)
|
||||
// Merge rnd into out where mask bit is 0.
|
||||
for b := 0; b < 16; b++ {
|
||||
// Host bits start at bit index `ones`, byte `b`.
|
||||
for b := 0; b < net.IPv6len; b++ {
|
||||
byteStart := b * 8
|
||||
byteEnd := byteStart + 8
|
||||
if byteEnd <= ones {
|
||||
continue // entirely network
|
||||
}
|
||||
if byteStart >= ones {
|
||||
out[b] = rnd[b] // entirely host
|
||||
switch {
|
||||
case byteEnd <= ones:
|
||||
// Entirely inside the network prefix — leave untouched.
|
||||
continue
|
||||
case byteStart >= ones:
|
||||
// Entirely inside the host portion — fully randomise.
|
||||
out[b] = rnd[b]
|
||||
default:
|
||||
// Split byte: top (ones-byteStart) bits are network, rest host.
|
||||
networkBits := ones - byteStart
|
||||
hostMask := byte(0xFF) >> uint(networkBits)
|
||||
out[b] = (out[b] & ^hostMask) | (rnd[b] & hostMask)
|
||||
}
|
||||
// Split byte: top (ones-byteStart) bits are network, rest is host.
|
||||
networkBits := ones - byteStart
|
||||
hostMask := byte(0xFF) >> uint(networkBits)
|
||||
out[b] = (out[b] & ^hostMask) | (rnd[b] & hostMask)
|
||||
}
|
||||
_ = hostBits
|
||||
return out, nil
|
||||
}
|
||||
|
||||
@@ -360,15 +364,34 @@ func toStringSlice(ns []*net.IPNet) []string {
|
||||
return out
|
||||
}
|
||||
|
||||
// canonical returns the textual form of ip in its native family, so the same
|
||||
// host address is always represented identically regardless of whether it
|
||||
// arrived as a 4-byte slice, a 16-byte v4-in-v6 slice, or a string-parsed
|
||||
// net.IP. Used as the key for the in-use map.
|
||||
//
|
||||
// Returns "" for nil input — callers MUST treat the returned key as opaque
|
||||
// and never use the empty string as a sentinel.
|
||||
func canonical(ip net.IP) string {
|
||||
if ip == nil {
|
||||
return ""
|
||||
}
|
||||
if v4 := ip.To4(); v4 != nil {
|
||||
return v4.String()
|
||||
}
|
||||
return ip.To16().String()
|
||||
if v16 := ip.To16(); v16 != nil {
|
||||
return v16.String()
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// ipToU32 reads a 4-byte IPv4 net.IP into a uint32. The caller is expected
|
||||
// to have already validated that ip is an IPv4 address; mis-use returns 0
|
||||
// rather than panicking.
|
||||
func ipToU32(ip net.IP) uint32 {
|
||||
v4 := ip.To4()
|
||||
if v4 == nil {
|
||||
return 0
|
||||
}
|
||||
return uint32(v4[0])<<24 | uint32(v4[1])<<16 | uint32(v4[2])<<8 | uint32(v4[3])
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,169 @@
|
||||
package agent
|
||||
|
||||
import (
|
||||
"net"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// FuzzIPAM_Allocate runs randomly-driven Allocate/Release sequences against
|
||||
// a /120 IPv6 + /28 IPv4 IPAM so the fuzzer can hit address exhaustion.
|
||||
//
|
||||
// Properties checked:
|
||||
//
|
||||
// 1. Allocate never panics regardless of the action stream.
|
||||
// 2. The set of in-use addresses never contains an address that has been
|
||||
// released without a subsequent successful Allocate.
|
||||
// 3. A successful v6 allocation always yields an address inside the
|
||||
// configured /120, and a successful v4 always inside the configured /28.
|
||||
// 4. ipToU32(canonical(allocated v4)) round-trips, and likewise that no
|
||||
// v4 allocation lands on .0 (network) or .15 (broadcast) of the /28.
|
||||
//
|
||||
// The fuzzed bytes are interpreted as an opcode stream:
|
||||
// - bytes[i] & 0x03 selects the action: 0=alloc-v6, 1=alloc-v4,
|
||||
// 2=alloc-dual, 3=release-most-recent.
|
||||
// - bytes[i]>>2 is fed into the deterministic random source so different
|
||||
// fuzzed bytes drive different IID/index choices.
|
||||
func FuzzIPAM_Allocate(f *testing.F) {
|
||||
f.Add([]byte{0, 0, 0, 0})
|
||||
f.Add([]byte{1, 1, 1, 1})
|
||||
f.Add([]byte{2, 2, 2, 2})
|
||||
f.Add([]byte{0, 1, 2, 3})
|
||||
f.Add([]byte(longString("\x00\x01\x02\x03", 256)))
|
||||
|
||||
f.Fuzz(func(t *testing.T, ops []byte) {
|
||||
ipam, err := NewIPAM(
|
||||
[]string{"2001:db8::/120"}, // 256 host slots; 16 bytes of fuzzed nibbles
|
||||
[]string{"10.0.0.0/28"}, // 14 usable hosts (.2..14)
|
||||
)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
// Deterministic source: replay nibbles cycled from `ops`.
|
||||
fr := &fakeRand{
|
||||
nibbles: append([]byte{}, ops...),
|
||||
iids: [][]byte{
|
||||
// 16 bytes of "host portion" — only the last byte matters
|
||||
// for a /120 prefix.
|
||||
makeIID(ops, 0),
|
||||
makeIID(ops, 1),
|
||||
makeIID(ops, 2),
|
||||
makeIID(ops, 3),
|
||||
},
|
||||
}
|
||||
if len(fr.nibbles) == 0 {
|
||||
fr.nibbles = []byte{0}
|
||||
}
|
||||
ipam.randSrc = fr
|
||||
|
||||
net6 := mustNet(t, "2001:db8::/120")
|
||||
net4 := mustNet(t, "10.0.0.0/28")
|
||||
|
||||
var live []AllocResult
|
||||
seen := map[string]struct{}{}
|
||||
|
||||
for idx, op := range ops {
|
||||
req := AllocRequest{ContainerID: idStr(idx)}
|
||||
switch op & 0x03 {
|
||||
case 0:
|
||||
req.WantV6 = true
|
||||
case 1:
|
||||
req.WantV4 = true
|
||||
case 2:
|
||||
req.WantV6, req.WantV4 = true, true
|
||||
case 3:
|
||||
if len(live) == 0 {
|
||||
continue
|
||||
}
|
||||
rel := live[len(live)-1]
|
||||
live = live[:len(live)-1]
|
||||
ipam.Release(rel.IP6, rel.IP4)
|
||||
delete(seen, canonical(rel.IP6))
|
||||
delete(seen, canonical(rel.IP4))
|
||||
continue
|
||||
}
|
||||
|
||||
res, err := ipam.Allocate(req)
|
||||
if err != nil {
|
||||
continue // exhaustion is acceptable
|
||||
}
|
||||
|
||||
if req.WantV6 {
|
||||
if res.IP6 == nil {
|
||||
t.Fatalf("requested v6 but got nil")
|
||||
}
|
||||
if !net6.Contains(res.IP6) {
|
||||
t.Fatalf("v6 %s outside /120", res.IP6)
|
||||
}
|
||||
if _, dup := seen[canonical(res.IP6)]; dup {
|
||||
t.Fatalf("v6 %s duplicated", res.IP6)
|
||||
}
|
||||
seen[canonical(res.IP6)] = struct{}{}
|
||||
}
|
||||
if req.WantV4 {
|
||||
if res.IP4 == nil {
|
||||
t.Fatalf("requested v4 but got nil")
|
||||
}
|
||||
if !net4.Contains(res.IP4) {
|
||||
t.Fatalf("v4 %s outside /28", res.IP4)
|
||||
}
|
||||
v4 := res.IP4.To4()
|
||||
if v4 == nil {
|
||||
t.Fatalf("v4 result not 4-byte: %s", res.IP4)
|
||||
}
|
||||
// Skip .0 (network) and .15 (broadcast). The allocator
|
||||
// should also skip .1 (gateway) by convention.
|
||||
last := v4[3]
|
||||
if last == 0 || last == 1 || last == 15 {
|
||||
t.Fatalf("v4 %s in reserved range", res.IP4)
|
||||
}
|
||||
if _, dup := seen[canonical(res.IP4)]; dup {
|
||||
t.Fatalf("v4 %s duplicated", res.IP4)
|
||||
}
|
||||
seen[canonical(res.IP4)] = struct{}{}
|
||||
}
|
||||
live = append(live, res)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
// FuzzCanonical asserts that canonical never panics and is idempotent.
|
||||
func FuzzCanonical(f *testing.F) {
|
||||
f.Add([]byte{})
|
||||
f.Add([]byte{1, 2, 3, 4})
|
||||
f.Add([]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0})
|
||||
f.Add([]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0xff, 10, 0, 0, 1}) // v4-mapped v6
|
||||
f.Add([]byte{0xff})
|
||||
|
||||
f.Fuzz(func(t *testing.T, b []byte) {
|
||||
ip := net.IP(b)
|
||||
s1 := canonical(ip)
|
||||
// Idempotent: re-canonicalising the parsed form yields the same
|
||||
// string for any non-empty result.
|
||||
if s1 != "" {
|
||||
parsed := net.ParseIP(s1)
|
||||
if parsed == nil {
|
||||
t.Fatalf("canonical(%v)=%q is not parseable as IP", b, s1)
|
||||
}
|
||||
if got := canonical(parsed); got != s1 {
|
||||
t.Fatalf("not idempotent: %q -> %q", s1, got)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func makeIID(seed []byte, salt byte) []byte {
|
||||
out := make([]byte, net.IPv6len)
|
||||
for i := range out {
|
||||
if i < len(seed) {
|
||||
out[i] = seed[i] ^ salt
|
||||
} else {
|
||||
out[i] = salt
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func idStr(i int) string {
|
||||
const hex = "0123456789abcdef"
|
||||
return string([]byte{'c', '-', hex[(i>>4)&0xF], hex[i&0xF]})
|
||||
}
|
||||
@@ -0,0 +1,85 @@
|
||||
//go:build linux
|
||||
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Applier hands rendered nft scripts to the kernel via `nft -f -`.
|
||||
// nftables guarantees the entire script applies atomically — if any line
|
||||
// is rejected, the previous ruleset stays intact.
|
||||
//
|
||||
// Applier maintains the last-applied script string and skips the exec
|
||||
// when the new render is byte-identical, so a 5s reconcile tick on a
|
||||
// quiet cluster is cheap.
|
||||
type Applier struct {
|
||||
// NftPath is the path to the nft binary. Empty means "look up `nft`
|
||||
// on PATH". Tests set this to a fake.
|
||||
NftPath string
|
||||
|
||||
// Timeout bounds an individual nft invocation; if zero, defaults to
|
||||
// 5 seconds.
|
||||
Timeout time.Duration
|
||||
|
||||
last string
|
||||
}
|
||||
|
||||
// Apply runs `nft -f -` with the supplied script. Idempotent: if script
|
||||
// equals the last successful application, this is a no-op.
|
||||
//
|
||||
// Returns an error from nft (with stderr captured) if the script is
|
||||
// malformed or the kernel rejects it.
|
||||
func (a *Applier) Apply(ctx context.Context, script string) error {
|
||||
if script == a.last {
|
||||
return nil
|
||||
}
|
||||
timeout := a.Timeout
|
||||
if timeout == 0 {
|
||||
timeout = 5 * time.Second
|
||||
}
|
||||
bin := a.NftPath
|
||||
if bin == "" {
|
||||
bin = "nft"
|
||||
}
|
||||
cctx, cancel := context.WithTimeout(ctx, timeout)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cctx, bin, "-f", "-")
|
||||
cmd.Stdin = bytes.NewBufferString(script)
|
||||
var stderr bytes.Buffer
|
||||
cmd.Stderr = &stderr
|
||||
if err := cmd.Run(); err != nil {
|
||||
return fmt.Errorf("nft -f -: %w: %s", err, stderr.String())
|
||||
}
|
||||
a.last = script
|
||||
return nil
|
||||
}
|
||||
|
||||
// Clear tears down the flock NetworkPolicy table — used by graceful
|
||||
// shutdown so a stopping agent doesn't leave stale enforcement behind.
|
||||
// Best-effort: if nft is missing or the table doesn't exist, returns
|
||||
// nil.
|
||||
func (a *Applier) Clear(ctx context.Context) error {
|
||||
timeout := a.Timeout
|
||||
if timeout == 0 {
|
||||
timeout = 5 * time.Second
|
||||
}
|
||||
bin := a.NftPath
|
||||
if bin == "" {
|
||||
bin = "nft"
|
||||
}
|
||||
cctx, cancel := context.WithTimeout(ctx, timeout)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cctx, bin, "destroy", "table", "inet", "flock_netpol")
|
||||
if err := cmd.Run(); err != nil {
|
||||
// nft returns non-zero if the table doesn't exist — that's a
|
||||
// success for our purposes.
|
||||
return nil
|
||||
}
|
||||
a.last = ""
|
||||
return nil
|
||||
}
|
||||
@@ -0,0 +1,16 @@
|
||||
//go:build !linux
|
||||
|
||||
package netpol
|
||||
|
||||
import "context"
|
||||
|
||||
// Applier is a no-op on non-Linux build hosts so unit tests run on macOS
|
||||
// without nft.
|
||||
type Applier struct {
|
||||
NftPath string
|
||||
Timeout interface{}
|
||||
last string
|
||||
}
|
||||
|
||||
func (a *Applier) Apply(_ context.Context, script string) error { a.last = script; return nil }
|
||||
func (a *Applier) Clear(_ context.Context) error { a.last = ""; return nil }
|
||||
@@ -0,0 +1,250 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"net"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
corev1 "k8s.io/api/core/v1"
|
||||
netv1 "k8s.io/api/networking/v1"
|
||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
"k8s.io/apimachinery/pkg/util/intstr"
|
||||
)
|
||||
|
||||
// These fixtures mirror the three NetworkPolicies live in the sjc001
|
||||
// cluster on 2026-04-25. They serve as integration-shaped tests: the
|
||||
// translator + renderer must produce a sensible nft script for each.
|
||||
//
|
||||
// Source of truth (refresh by running `kubectl get netpol -A -o yaml`):
|
||||
//
|
||||
// - calico-apiserver/allow-apiserver
|
||||
// - remote-proxies/lodge-home-assistant-ingress
|
||||
// - storage/garage-admin-restrict
|
||||
|
||||
// allowApiserverPolicy: TCP/5443 ingress to apiserver=true pods, no peer
|
||||
// restriction (allow-from-anywhere on that port).
|
||||
func allowApiserverPolicy() netv1.NetworkPolicy {
|
||||
tcp := corev1.ProtocolTCP
|
||||
port := intstr.FromInt32(5443)
|
||||
return netv1.NetworkPolicy{
|
||||
ObjectMeta: metav1.ObjectMeta{Namespace: "calico-apiserver", Name: "allow-apiserver"},
|
||||
Spec: netv1.NetworkPolicySpec{
|
||||
PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"apiserver": "true"}},
|
||||
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
|
||||
Ingress: []netv1.NetworkPolicyIngressRule{{
|
||||
Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &port}},
|
||||
}},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// lodgeHomeAssistantPolicy: TCP/8080 from any pod in the `edge` namespace
|
||||
// to pods labelled app=lodge-home-assistant.
|
||||
func lodgeHomeAssistantPolicy() netv1.NetworkPolicy {
|
||||
tcp := corev1.ProtocolTCP
|
||||
port := intstr.FromInt32(8080)
|
||||
return netv1.NetworkPolicy{
|
||||
ObjectMeta: metav1.ObjectMeta{Namespace: "remote-proxies", Name: "lodge-home-assistant-ingress"},
|
||||
Spec: netv1.NetworkPolicySpec{
|
||||
PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"app": "lodge-home-assistant"}},
|
||||
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
|
||||
Ingress: []netv1.NetworkPolicyIngressRule{{
|
||||
From: []netv1.NetworkPolicyPeer{{
|
||||
NamespaceSelector: &metav1.LabelSelector{
|
||||
MatchLabels: map[string]string{"kubernetes.io/metadata.name": "edge"},
|
||||
},
|
||||
}},
|
||||
Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &port}},
|
||||
}},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// garageAdminPolicy: complex two-rule policy.
|
||||
//
|
||||
// 1. Allow TCP/{3900, 80, 3901} from anywhere.
|
||||
// 2. Allow TCP/3903 only from pods in `edge` or `storage`.
|
||||
func garageAdminPolicy() netv1.NetworkPolicy {
|
||||
tcp := corev1.ProtocolTCP
|
||||
p3900 := intstr.FromInt32(3900)
|
||||
p80 := intstr.FromInt32(80)
|
||||
p3901 := intstr.FromInt32(3901)
|
||||
p3903 := intstr.FromInt32(3903)
|
||||
return netv1.NetworkPolicy{
|
||||
ObjectMeta: metav1.ObjectMeta{Namespace: "storage", Name: "garage-admin-restrict"},
|
||||
Spec: netv1.NetworkPolicySpec{
|
||||
PodSelector: metav1.LabelSelector{MatchLabels: map[string]string{"app": "garage"}},
|
||||
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
|
||||
Ingress: []netv1.NetworkPolicyIngressRule{
|
||||
{
|
||||
Ports: []netv1.NetworkPolicyPort{
|
||||
{Protocol: &tcp, Port: &p3900},
|
||||
{Protocol: &tcp, Port: &p80},
|
||||
{Protocol: &tcp, Port: &p3901},
|
||||
},
|
||||
},
|
||||
{
|
||||
From: []netv1.NetworkPolicyPeer{
|
||||
{NamespaceSelector: &metav1.LabelSelector{
|
||||
MatchLabels: map[string]string{"kubernetes.io/metadata.name": "edge"},
|
||||
}},
|
||||
{NamespaceSelector: &metav1.LabelSelector{
|
||||
MatchLabels: map[string]string{"kubernetes.io/metadata.name": "storage"},
|
||||
}},
|
||||
},
|
||||
Ports: []netv1.NetworkPolicyPort{{Protocol: &tcp, Port: &p3903}},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// TestClusterFixture_AllowApiserver — pod selected by the policy gets
|
||||
// isolated; the rendered script accepts TCP/5443 from anywhere.
|
||||
func TestClusterFixture_AllowApiserver(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "calico-apiserver",
|
||||
Name: "calico-apiserver-1",
|
||||
Labels: map[string]string{"apiserver": "true"},
|
||||
HostIface: "flock00000001",
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{pod},
|
||||
Policies: []netv1.NetworkPolicy{allowApiserverPolicy()},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
in, _ := isolationFor(out, "calico-apiserver/calico-apiserver-1")
|
||||
if !in {
|
||||
t.Fatalf("apiserver pod should be isolated for ingress")
|
||||
}
|
||||
script := Render(out)
|
||||
if !strings.Contains(script, "tcp dport 5443 accept") {
|
||||
t.Fatalf("expected TCP/5443 allow:\n%s", script)
|
||||
}
|
||||
// No peer filter — allow-all-on-port.
|
||||
if strings.Contains(script, "ip6 saddr {") || strings.Contains(script, "ip saddr {") {
|
||||
t.Fatalf("expected no peer filter for allow-from-anywhere:\n%s", script)
|
||||
}
|
||||
}
|
||||
|
||||
// TestClusterFixture_LodgeHomeAssistant — pod isolated; only TCP/8080
|
||||
// from edge namespace is allowed.
|
||||
func TestClusterFixture_LodgeHomeAssistant(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "remote-proxies",
|
||||
Name: "lodge-home-assistant-0",
|
||||
Labels: map[string]string{"app": "lodge-home-assistant"},
|
||||
HostIface: "flock00000002",
|
||||
IPs: []net.IP{mustIP("2001:db8::2")},
|
||||
}
|
||||
traefik := PeerPod{
|
||||
Namespace: "edge", Name: "traefik-0",
|
||||
Labels: map[string]string{"app": "traefik"},
|
||||
IPs: []net.IP{mustIP("2001:db8::aa")},
|
||||
}
|
||||
stranger := PeerPod{
|
||||
Namespace: "default", Name: "random",
|
||||
Labels: map[string]string{"app": "random"},
|
||||
IPs: []net.IP{mustIP("2001:db8::bb")},
|
||||
}
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{pod},
|
||||
PeerPods: []PeerPod{traefik, stranger},
|
||||
Namespaces: []Namespace{
|
||||
{Name: "edge", Labels: map[string]string{"kubernetes.io/metadata.name": "edge"}},
|
||||
{Name: "default", Labels: map[string]string{"kubernetes.io/metadata.name": "default"}},
|
||||
{Name: "remote-proxies", Labels: map[string]string{"kubernetes.io/metadata.name": "remote-proxies"}},
|
||||
},
|
||||
Policies: []netv1.NetworkPolicy{lodgeHomeAssistantPolicy()},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(out.Rules) != 1 {
|
||||
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
|
||||
}
|
||||
r := out.Rules[0]
|
||||
// Peer should be exactly traefik's IP, not stranger's.
|
||||
got := map[string]bool{}
|
||||
for _, c := range r.PeerCIDRs {
|
||||
got[c.IP.String()] = true
|
||||
}
|
||||
if !got["2001:db8::aa"] {
|
||||
t.Fatalf("traefik IP missing from rule: %v", got)
|
||||
}
|
||||
if got["2001:db8::bb"] {
|
||||
t.Fatalf("stranger IP leaked into rule")
|
||||
}
|
||||
script := Render(out)
|
||||
if !strings.Contains(script, "tcp dport 8080 accept") {
|
||||
t.Fatalf("expected TCP/8080 allow:\n%s", script)
|
||||
}
|
||||
}
|
||||
|
||||
// TestClusterFixture_Garage — verifies the two-rule policy:
|
||||
//
|
||||
// 1. ports {3900, 80, 3901} accept from any peer
|
||||
// 2. port 3903 accept only from edge or storage namespaces
|
||||
func TestClusterFixture_Garage(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "storage", Name: "garage-0",
|
||||
Labels: map[string]string{"app": "garage"},
|
||||
HostIface: "flock00000003",
|
||||
IPs: []net.IP{mustIP("2001:db8::3")},
|
||||
}
|
||||
storagePeer := PeerPod{
|
||||
Namespace: "storage", Name: "garage-1",
|
||||
Labels: map[string]string{"app": "garage"},
|
||||
IPs: []net.IP{mustIP("2001:db8::31")},
|
||||
}
|
||||
edgePeer := PeerPod{
|
||||
Namespace: "edge", Name: "traefik-0",
|
||||
Labels: map[string]string{"app": "traefik"},
|
||||
IPs: []net.IP{mustIP("2001:db8::41")},
|
||||
}
|
||||
stranger := PeerPod{
|
||||
Namespace: "default", Name: "random",
|
||||
Labels: map[string]string{"app": "random"},
|
||||
IPs: []net.IP{mustIP("2001:db8::ff")},
|
||||
}
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{pod},
|
||||
PeerPods: []PeerPod{storagePeer, edgePeer, stranger},
|
||||
Namespaces: []Namespace{
|
||||
{Name: "edge", Labels: map[string]string{"kubernetes.io/metadata.name": "edge"}},
|
||||
{Name: "storage", Labels: map[string]string{"kubernetes.io/metadata.name": "storage"}},
|
||||
{Name: "default", Labels: map[string]string{"kubernetes.io/metadata.name": "default"}},
|
||||
},
|
||||
Policies: []netv1.NetworkPolicy{garageAdminPolicy()},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
// Two ingress rules in the source policy → two Rules out (one per
|
||||
// peer set, ports inline).
|
||||
if len(out.Rules) != 2 {
|
||||
t.Fatalf("expected 2 rules (one per ingress entry), got %d", len(out.Rules))
|
||||
}
|
||||
script := Render(out)
|
||||
for _, want := range []string{
|
||||
"tcp dport 3900 accept",
|
||||
"tcp dport 80 accept",
|
||||
"tcp dport 3901 accept",
|
||||
"tcp dport 3903 accept",
|
||||
} {
|
||||
if !strings.Contains(script, want) {
|
||||
t.Errorf("missing %q in script:\n%s", want, script)
|
||||
}
|
||||
}
|
||||
// The 3903 rule must carry a peer filter for both edge and storage
|
||||
// peer IPs but not the stranger.
|
||||
if !strings.Contains(script, "2001:db8::31/128") || !strings.Contains(script, "2001:db8::41/128") {
|
||||
t.Fatalf("expected edge+storage peer IPs in 3903 rule:\n%s", script)
|
||||
}
|
||||
if strings.Contains(script, "2001:db8::ff/128") {
|
||||
t.Fatalf("stranger IP must not appear:\n%s", script)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
// Package netpol implements Kubernetes NetworkPolicy enforcement for flock.
|
||||
//
|
||||
// # Model
|
||||
//
|
||||
// NetworkPolicy is a Kubernetes-native API (`networking.k8s.io/v1`) that
|
||||
// describes which pods may receive traffic (Ingress) and / or initiate
|
||||
// traffic (Egress). The semantics are isolation by selection: a pod that is
|
||||
// selected by *any* NetworkPolicy in a given direction becomes default-deny
|
||||
// in that direction, plus the union of all "allow" rules from every policy
|
||||
// that selects it. A pod selected by no policy is unrestricted.
|
||||
//
|
||||
// flock enforces these semantics with nftables. Each agent is responsible
|
||||
// for the pods scheduled on its own node — peer addresses (from
|
||||
// podSelector / namespaceSelector / ipBlock peers) come from a cluster-wide
|
||||
// informer set so the agent can resolve peers that live elsewhere.
|
||||
//
|
||||
// # Pipeline
|
||||
//
|
||||
// The work is split into four stages with hard boundaries between them so
|
||||
// each can be tested in isolation:
|
||||
//
|
||||
// 1. Informers (informers.go) — watch NetworkPolicies, Namespaces, and
|
||||
// all Pods in the cluster. Maintain indices the translator can query.
|
||||
//
|
||||
// 2. Translator (translator.go) — pure function from
|
||||
// (NetworkPolicy set, Namespace set, Pod set, local-node pod set) to
|
||||
// []Rule. No I/O, no hidden state — straightforward to fuzz and unit
|
||||
// test. Implements the default-deny semantics and the peer-resolution
|
||||
// rules from the NetworkPolicy spec.
|
||||
//
|
||||
// 3. Renderer (render.go) — pure function from []Rule to an nft script
|
||||
// (string). Output is deterministic so the apply stage can de-dupe.
|
||||
//
|
||||
// 4. Apply (apply_linux.go) — shell out to `nft -f -` for an atomic
|
||||
// reconfiguration. nftables guarantees the whole script applies as a
|
||||
// single transaction; partial failures roll back automatically.
|
||||
//
|
||||
// # Why nftables (and not eBPF)
|
||||
//
|
||||
// Atomic ruleset transactions, kernel-native, no userspace ebpf-loader to
|
||||
// maintain, and behaviour an operator can read directly with
|
||||
// `nft list ruleset`. The cost is that we walk per-pod chains in software,
|
||||
// which is fine at the cluster sizes flock targets.
|
||||
package netpol
|
||||
@@ -0,0 +1,222 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"net"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
corev1 "k8s.io/api/core/v1"
|
||||
netv1 "k8s.io/api/networking/v1"
|
||||
"k8s.io/client-go/informers"
|
||||
"k8s.io/client-go/kubernetes"
|
||||
"k8s.io/client-go/rest"
|
||||
"k8s.io/client-go/tools/cache"
|
||||
)
|
||||
|
||||
// World aggregates the cluster-wide caches the reconciler queries on
|
||||
// every pass: NetworkPolicies, Namespaces, and all Pods (for peer
|
||||
// resolution). Each field is safe for concurrent reads.
|
||||
type World struct {
|
||||
logger *slog.Logger
|
||||
|
||||
mu sync.RWMutex
|
||||
policies map[string]netv1.NetworkPolicy // key = ns/name
|
||||
namespaces map[string]Namespace
|
||||
peerPods map[string]PeerPod // key = ns/name
|
||||
|
||||
onChange []func()
|
||||
}
|
||||
|
||||
// NewWorld returns an empty World. Callers should call Start to populate
|
||||
// it; before Start, the snapshot accessors return empty slices.
|
||||
func NewWorld(logger *slog.Logger) *World {
|
||||
return &World{
|
||||
logger: logger,
|
||||
policies: map[string]netv1.NetworkPolicy{},
|
||||
namespaces: map[string]Namespace{},
|
||||
peerPods: map[string]PeerPod{},
|
||||
}
|
||||
}
|
||||
|
||||
// OnChange registers a callback fired (synchronously, inside the informer
|
||||
// event handler) whenever any watched object changes. The reconciler
|
||||
// uses this to debounce policy reloads.
|
||||
func (w *World) OnChange(f func()) {
|
||||
w.mu.Lock()
|
||||
defer w.mu.Unlock()
|
||||
w.onChange = append(w.onChange, f)
|
||||
}
|
||||
|
||||
func (w *World) fireChange() {
|
||||
w.mu.RLock()
|
||||
cbs := append([]func(){}, w.onChange...)
|
||||
w.mu.RUnlock()
|
||||
for _, f := range cbs {
|
||||
f()
|
||||
}
|
||||
}
|
||||
|
||||
// Start launches three informers (NetworkPolicy, Namespace, Pod) against
|
||||
// the cluster API. It blocks until each cache reports synced. The caller
|
||||
// is responsible for cancelling ctx on shutdown.
|
||||
func (w *World) Start(ctx context.Context, cfg *rest.Config) error {
|
||||
cs, err := kubernetes.NewForConfig(cfg)
|
||||
if err != nil {
|
||||
return fmt.Errorf("kubernetes client: %w", err)
|
||||
}
|
||||
factory := informers.NewSharedInformerFactory(cs, 10*time.Minute)
|
||||
|
||||
npInformer := factory.Networking().V1().NetworkPolicies().Informer()
|
||||
nsInformer := factory.Core().V1().Namespaces().Informer()
|
||||
podInformer := factory.Core().V1().Pods().Informer()
|
||||
|
||||
if _, err := npInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
|
||||
AddFunc: func(obj interface{}) { w.onPolicy(obj, false) },
|
||||
UpdateFunc: func(_, n interface{}) { w.onPolicy(n, false) },
|
||||
DeleteFunc: func(obj interface{}) { w.onPolicy(obj, true) },
|
||||
}); err != nil {
|
||||
return fmt.Errorf("add netpol handler: %w", err)
|
||||
}
|
||||
if _, err := nsInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
|
||||
AddFunc: func(obj interface{}) { w.onNamespace(obj, false) },
|
||||
UpdateFunc: func(_, n interface{}) { w.onNamespace(n, false) },
|
||||
DeleteFunc: func(obj interface{}) { w.onNamespace(obj, true) },
|
||||
}); err != nil {
|
||||
return fmt.Errorf("add ns handler: %w", err)
|
||||
}
|
||||
if _, err := podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
|
||||
AddFunc: func(obj interface{}) { w.onPod(obj, false) },
|
||||
UpdateFunc: func(_, n interface{}) { w.onPod(n, false) },
|
||||
DeleteFunc: func(obj interface{}) { w.onPod(obj, true) },
|
||||
}); err != nil {
|
||||
return fmt.Errorf("add pod handler: %w", err)
|
||||
}
|
||||
|
||||
w.logger.Info("netpol informers starting")
|
||||
factory.Start(ctx.Done())
|
||||
if !cache.WaitForCacheSync(ctx.Done(),
|
||||
npInformer.HasSynced, nsInformer.HasSynced, podInformer.HasSynced) {
|
||||
return fmt.Errorf("netpol informer caches failed to sync")
|
||||
}
|
||||
w.logger.Info("netpol informers synced",
|
||||
"netpols", len(w.snapshotPolicies()),
|
||||
"namespaces", len(w.snapshotNamespaces()),
|
||||
"peer_pods", len(w.snapshotPeerPods()))
|
||||
return nil
|
||||
}
|
||||
|
||||
// unwrapDFSU lifts a DeletedFinalStateUnknown wrapper if present.
|
||||
func unwrapDFSU(obj interface{}) interface{} {
|
||||
if d, ok := obj.(cache.DeletedFinalStateUnknown); ok {
|
||||
return d.Obj
|
||||
}
|
||||
return obj
|
||||
}
|
||||
|
||||
func (w *World) onPolicy(obj interface{}, deleted bool) {
|
||||
p, ok := unwrapDFSU(obj).(*netv1.NetworkPolicy)
|
||||
if !ok || p == nil {
|
||||
return
|
||||
}
|
||||
key := p.Namespace + "/" + p.Name
|
||||
w.mu.Lock()
|
||||
if deleted {
|
||||
delete(w.policies, key)
|
||||
} else {
|
||||
w.policies[key] = *p
|
||||
}
|
||||
w.mu.Unlock()
|
||||
w.fireChange()
|
||||
}
|
||||
|
||||
func (w *World) onNamespace(obj interface{}, deleted bool) {
|
||||
ns, ok := unwrapDFSU(obj).(*corev1.Namespace)
|
||||
if !ok || ns == nil {
|
||||
return
|
||||
}
|
||||
w.mu.Lock()
|
||||
if deleted {
|
||||
delete(w.namespaces, ns.Name)
|
||||
} else {
|
||||
w.namespaces[ns.Name] = Namespace{Name: ns.Name, Labels: ns.Labels}
|
||||
}
|
||||
w.mu.Unlock()
|
||||
w.fireChange()
|
||||
}
|
||||
|
||||
func (w *World) onPod(obj interface{}, deleted bool) {
|
||||
pod, ok := unwrapDFSU(obj).(*corev1.Pod)
|
||||
if !ok || pod == nil {
|
||||
return
|
||||
}
|
||||
key := pod.Namespace + "/" + pod.Name
|
||||
w.mu.Lock()
|
||||
if deleted {
|
||||
delete(w.peerPods, key)
|
||||
} else {
|
||||
w.peerPods[key] = PeerPod{
|
||||
Namespace: pod.Namespace,
|
||||
Name: pod.Name,
|
||||
Labels: pod.Labels,
|
||||
IPs: podIPs(pod),
|
||||
}
|
||||
}
|
||||
w.mu.Unlock()
|
||||
w.fireChange()
|
||||
}
|
||||
|
||||
// podIPs extracts every PodIP from the status. Pods without status (still
|
||||
// scheduling) yield nil — safe for the translator.
|
||||
func podIPs(p *corev1.Pod) []net.IP {
|
||||
out := make([]net.IP, 0, len(p.Status.PodIPs))
|
||||
for _, addr := range p.Status.PodIPs {
|
||||
ip := net.ParseIP(addr.IP)
|
||||
if ip == nil {
|
||||
continue
|
||||
}
|
||||
out = append(out, ip)
|
||||
}
|
||||
if len(out) == 0 && p.Status.PodIP != "" {
|
||||
// Older clusters may populate PodIP but not PodIPs; tolerate both.
|
||||
if ip := net.ParseIP(p.Status.PodIP); ip != nil {
|
||||
out = append(out, ip)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// snapshotPolicies returns a defensive copy of the policy map's values.
|
||||
func (w *World) snapshotPolicies() []netv1.NetworkPolicy {
|
||||
w.mu.RLock()
|
||||
defer w.mu.RUnlock()
|
||||
out := make([]netv1.NetworkPolicy, 0, len(w.policies))
|
||||
for _, p := range w.policies {
|
||||
out = append(out, p)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// snapshotNamespaces returns a defensive copy of the namespace map.
|
||||
func (w *World) snapshotNamespaces() []Namespace {
|
||||
w.mu.RLock()
|
||||
defer w.mu.RUnlock()
|
||||
out := make([]Namespace, 0, len(w.namespaces))
|
||||
for _, n := range w.namespaces {
|
||||
out = append(out, n)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// snapshotPeerPods returns a defensive copy of the peer-pod map.
|
||||
func (w *World) snapshotPeerPods() []PeerPod {
|
||||
w.mu.RLock()
|
||||
defer w.mu.RUnlock()
|
||||
out := make([]PeerPod, 0, len(w.peerPods))
|
||||
for _, p := range w.peerPods {
|
||||
out = append(out, p)
|
||||
}
|
||||
return out
|
||||
}
|
||||
@@ -0,0 +1,115 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"context"
|
||||
"log/slog"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// LocalPodSource produces the set of local pods (with their HostIface and
|
||||
// IPs) the reconciler should enforce policy for. The agent's allocation
|
||||
// store + pod informer is the natural implementer.
|
||||
//
|
||||
// The function is called inside the reconciler under no lock, so it must
|
||||
// be safe for concurrent invocation.
|
||||
type LocalPodSource func() []Pod
|
||||
|
||||
// Reconciler turns the World cache + LocalPodSource into nft rule
|
||||
// applications. One reconcile pass:
|
||||
//
|
||||
// pods + policies + namespaces → Translate → Render → Apply
|
||||
//
|
||||
// The pass runs on:
|
||||
//
|
||||
// - World.OnChange (any informer event), debounced through a single
|
||||
// coalescing channel,
|
||||
// - a periodic tick (default 30s) so we self-heal if the kernel
|
||||
// ruleset diverges from desired (e.g. someone manually `nft flush`d),
|
||||
// - and explicit Trigger() calls (the agent fires this from CNI ADD /
|
||||
// DEL hooks so policy lands before pod traffic flows).
|
||||
type Reconciler struct {
|
||||
World *World
|
||||
Local LocalPodSource
|
||||
Applier *Applier
|
||||
Logger *slog.Logger
|
||||
Interval time.Duration
|
||||
|
||||
mu sync.Mutex
|
||||
trigger chan struct{}
|
||||
}
|
||||
|
||||
// NewReconciler returns a Reconciler ready to Run. Interval defaults to
|
||||
// 30s if zero.
|
||||
func NewReconciler(world *World, local LocalPodSource, applier *Applier, logger *slog.Logger) *Reconciler {
|
||||
r := &Reconciler{
|
||||
World: world,
|
||||
Local: local,
|
||||
Applier: applier,
|
||||
Logger: logger,
|
||||
Interval: 30 * time.Second,
|
||||
trigger: make(chan struct{}, 1),
|
||||
}
|
||||
world.OnChange(r.Trigger)
|
||||
return r
|
||||
}
|
||||
|
||||
// Trigger requests one reconcile pass. Coalesces — if a pass is already
|
||||
// pending, the call is a no-op.
|
||||
func (r *Reconciler) Trigger() {
|
||||
select {
|
||||
case r.trigger <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
}
|
||||
|
||||
// Run blocks until ctx is cancelled. Reconciles on Trigger or every
|
||||
// Interval; calls Applier.Clear on shutdown.
|
||||
func (r *Reconciler) Run(ctx context.Context) {
|
||||
t := time.NewTicker(r.Interval)
|
||||
defer t.Stop()
|
||||
r.reconcile(ctx) // initial pass
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
// Best-effort: drop our table on graceful exit. If the agent
|
||||
// crashed without doing this, the next agent's first apply
|
||||
// will replace the stale table atomically anyway.
|
||||
_ = r.Applier.Clear(context.Background())
|
||||
return
|
||||
case <-t.C:
|
||||
r.reconcile(ctx)
|
||||
case <-r.trigger:
|
||||
r.reconcile(ctx)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (r *Reconciler) reconcile(ctx context.Context) {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
|
||||
in := Inputs{
|
||||
LocalPods: r.Local(),
|
||||
PeerPods: r.World.snapshotPeerPods(),
|
||||
Namespaces: r.World.snapshotNamespaces(),
|
||||
Policies: r.World.snapshotPolicies(),
|
||||
}
|
||||
out, err := Translate(in, func(s string) { r.Logger.Warn(s) })
|
||||
if err != nil {
|
||||
r.Logger.Warn("netpol translate failed", "err", err)
|
||||
return
|
||||
}
|
||||
script := Render(out)
|
||||
if err := r.Applier.Apply(ctx, script); err != nil {
|
||||
r.Logger.Warn("netpol apply failed", "err", err)
|
||||
return
|
||||
}
|
||||
if len(out.Isolated) > 0 {
|
||||
r.Logger.Info("netpol applied",
|
||||
"isolated_chains", len(out.Isolated),
|
||||
"rules", len(out.Rules),
|
||||
"local_pods", len(in.LocalPods),
|
||||
"policies", len(in.Policies))
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,160 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"context"
|
||||
"io"
|
||||
"log/slog"
|
||||
"net"
|
||||
"strings"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
"testing"
|
||||
|
||||
corev1 "k8s.io/api/core/v1"
|
||||
netv1 "k8s.io/api/networking/v1"
|
||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
)
|
||||
|
||||
// fakeApplier captures Apply calls for assertion. Drop-in for *Applier in
|
||||
// tests because Reconciler depends only on the (Apply, Clear) pair.
|
||||
type fakeApplier struct {
|
||||
mu sync.Mutex
|
||||
calls []string
|
||||
last string
|
||||
err error
|
||||
}
|
||||
|
||||
func (f *fakeApplier) Apply(_ context.Context, script string) error {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
if f.err != nil {
|
||||
return f.err
|
||||
}
|
||||
if script == f.last {
|
||||
return nil // de-dup like the real Applier
|
||||
}
|
||||
f.last = script
|
||||
f.calls = append(f.calls, script)
|
||||
return nil
|
||||
}
|
||||
func (f *fakeApplier) Clear(_ context.Context) error { return nil }
|
||||
func (f *fakeApplier) lastScript() string {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
return f.last
|
||||
}
|
||||
func (f *fakeApplier) callCount() int {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
return len(f.calls)
|
||||
}
|
||||
|
||||
// applierIface is satisfied by *Applier and *fakeApplier; we narrow
|
||||
// Reconciler to this in tests by adapting via a tiny wrapper.
|
||||
type applierIface interface {
|
||||
Apply(context.Context, string) error
|
||||
Clear(context.Context) error
|
||||
}
|
||||
|
||||
// reconcileOnce drives one pass synchronously without spinning a goroutine.
|
||||
func reconcileOnce(t *testing.T, world *World, local LocalPodSource, app applierIface) {
|
||||
t.Helper()
|
||||
in := Inputs{
|
||||
LocalPods: local(),
|
||||
PeerPods: world.snapshotPeerPods(),
|
||||
Namespaces: world.snapshotNamespaces(),
|
||||
Policies: world.snapshotPolicies(),
|
||||
}
|
||||
out, err := Translate(in, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if err := app.Apply(context.Background(), Render(out)); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// silentLogger returns a slog.Logger discarding everything — keeps test
|
||||
// output tidy.
|
||||
func silentLogger() *slog.Logger {
|
||||
return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{}))
|
||||
}
|
||||
|
||||
func TestReconciler_NoIsolatedPods_ShortScript(t *testing.T) {
|
||||
world := NewWorld(silentLogger())
|
||||
local := func() []Pod { return nil }
|
||||
app := &fakeApplier{}
|
||||
reconcileOnce(t, world, local, app)
|
||||
got := app.lastScript()
|
||||
if !strings.Contains(got, "table inet flock_netpol") {
|
||||
t.Fatalf("missing table:\n%s", got)
|
||||
}
|
||||
// Without any isolated pods the base chain has policy accept and no
|
||||
// jumps. That's the desired "open" state.
|
||||
if strings.Contains(got, "jump pod_") {
|
||||
t.Fatalf("unexpected jump in open state:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestReconciler_PolicyIsolatesLocalPod(t *testing.T) {
|
||||
world := NewWorld(silentLogger())
|
||||
|
||||
// Seed a default-deny policy in ns1.
|
||||
world.onPolicy(&netv1.NetworkPolicy{
|
||||
ObjectMeta: metav1.ObjectMeta{Namespace: "ns1", Name: "deny-all"},
|
||||
Spec: netv1.NetworkPolicySpec{
|
||||
PodSelector: metav1.LabelSelector{},
|
||||
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
|
||||
},
|
||||
}, false)
|
||||
|
||||
local := func() []Pod {
|
||||
return []Pod{{
|
||||
Namespace: "ns1", Name: "web",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
HostIface: "flock00000001",
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}}
|
||||
}
|
||||
app := &fakeApplier{}
|
||||
reconcileOnce(t, world, local, app)
|
||||
got := app.lastScript()
|
||||
|
||||
if !strings.Contains(got, "_ingress {") {
|
||||
t.Fatalf("expected pod ingress chain:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, "drop") {
|
||||
t.Fatalf("expected default-deny drop:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, `oifname "flock00000001"`) {
|
||||
t.Fatalf("expected base-chain jump anchored on veth:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestReconciler_DedupesIdenticalRender(t *testing.T) {
|
||||
world := NewWorld(silentLogger())
|
||||
local := func() []Pod {
|
||||
return []Pod{{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}}
|
||||
}
|
||||
app := &fakeApplier{}
|
||||
reconcileOnce(t, world, local, app)
|
||||
reconcileOnce(t, world, local, app)
|
||||
reconcileOnce(t, world, local, app)
|
||||
if got := app.callCount(); got != 1 {
|
||||
t.Fatalf("expected 1 unique apply, got %d", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestReconciler_OnChangeFiresTrigger(t *testing.T) {
|
||||
world := NewWorld(silentLogger())
|
||||
var triggered atomic.Int32
|
||||
world.OnChange(func() { triggered.Add(1) })
|
||||
world.onNamespace(&corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: "foo"}}, false)
|
||||
world.onPolicy(&netv1.NetworkPolicy{ObjectMeta: metav1.ObjectMeta{Namespace: "foo", Name: "p"}}, false)
|
||||
if triggered.Load() != 2 {
|
||||
t.Fatalf("expected 2 OnChange calls, got %d", triggered.Load())
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,322 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"hash/fnv"
|
||||
"net"
|
||||
"sort"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// Render produces an nftables script that, when applied with `nft -f -`,
|
||||
// installs the desired NetworkPolicy enforcement state for this node.
|
||||
//
|
||||
// Layout:
|
||||
//
|
||||
// table inet flock_netpol {
|
||||
// chain forward { # base chain on hook forward
|
||||
// type filter hook forward priority filter; policy accept;
|
||||
// # one jump per (pod, direction) that has rules and/or isolation
|
||||
// iifname "flock1a2b3c4d" ip6 saddr 2001:db8::1 jump pod_<hash>_egress
|
||||
// oifname "flock1a2b3c4d" ip6 daddr 2001:db8::1 jump pod_<hash>_ingress
|
||||
// }
|
||||
// chain pod_<hash>_ingress { # one per isolated direction
|
||||
// # explicit allow lines (empty for default-deny)
|
||||
// drop
|
||||
// }
|
||||
// chain pod_<hash>_egress { ... }
|
||||
// }
|
||||
//
|
||||
// The whole table is replaced atomically: a "delete table … 2>/dev/null"
|
||||
// (best-effort) followed by an "add table" + the chains. nft executes the
|
||||
// script as a single transaction; partial application is impossible.
|
||||
//
|
||||
// Output is deterministic: equal Output → byte-identical script. The
|
||||
// reconciler relies on this for de-dup.
|
||||
func Render(out Output) string {
|
||||
var sb strings.Builder
|
||||
|
||||
sb.WriteString("# Generated by flock-agent netpol; do not edit by hand.\n")
|
||||
// Best-effort delete; if the table doesn't exist (first run) nft
|
||||
// returns an error, hence the redirect. The "add table" then
|
||||
// recreates everything.
|
||||
sb.WriteString("destroy table inet flock_netpol\n")
|
||||
sb.WriteString("table inet flock_netpol {\n")
|
||||
|
||||
// Build per-(pod, direction) chains. We need them defined BEFORE the
|
||||
// base chain references them, so we render chains first.
|
||||
chains := buildChains(out)
|
||||
for _, c := range chains {
|
||||
writeChain(&sb, c)
|
||||
}
|
||||
|
||||
// Base chain emits jumps in a stable order (chain name asc).
|
||||
sb.WriteString("\tchain forward {\n")
|
||||
sb.WriteString("\t\ttype filter hook forward priority filter; policy accept;\n")
|
||||
for _, c := range chains {
|
||||
writeBaseJump(&sb, c)
|
||||
}
|
||||
sb.WriteString("\t}\n")
|
||||
|
||||
sb.WriteString("}\n")
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
// chain is one rendered chain — one direction of one pod.
|
||||
type chain struct {
|
||||
name string // pod_<hash>_ingress / _egress
|
||||
hostIface string
|
||||
podIPs []net.IP
|
||||
direction Direction
|
||||
rules []Rule
|
||||
policy string // "drop" or "accept"
|
||||
}
|
||||
|
||||
// buildChains groups rules by (PodKey, Direction) and adds default-deny
|
||||
// chains for isolated directions that received no explicit rules.
|
||||
func buildChains(out Output) []chain {
|
||||
type key struct {
|
||||
podKey string
|
||||
dir Direction
|
||||
}
|
||||
byKey := map[key]*chain{}
|
||||
|
||||
// Seed isolated directions with empty chains so default-deny lands
|
||||
// even when no explicit allow rule was emitted for them.
|
||||
for iso := range out.Isolated {
|
||||
byKey[key{podKey: iso.PodKey, dir: iso.Direction}] = &chain{
|
||||
direction: iso.Direction,
|
||||
policy: "drop",
|
||||
}
|
||||
}
|
||||
|
||||
// Append rules into their chain. Rule.PodIPs and HostIface are
|
||||
// authoritative — every rule for a given pod carries the same values
|
||||
// (translator invariant), so we copy from the first.
|
||||
for _, r := range out.Rules {
|
||||
k := key{podKey: r.PodKey, dir: r.Direction}
|
||||
c := byKey[k]
|
||||
if c == nil {
|
||||
// Rule for a non-isolated direction shouldn't happen in
|
||||
// practice (translator only emits rules for selected pods)
|
||||
// but be tolerant — the chain just gets policy accept.
|
||||
c = &chain{direction: r.Direction, policy: "accept"}
|
||||
byKey[k] = c
|
||||
}
|
||||
c.rules = append(c.rules, r)
|
||||
if c.hostIface == "" {
|
||||
c.hostIface = r.HostIface
|
||||
c.podIPs = append([]net.IP(nil), r.PodIPs...)
|
||||
}
|
||||
}
|
||||
|
||||
// If a chain was created from Isolated only (no rules), look up the
|
||||
// pod's HostIface + IPs from Output.Pods. This is the path a
|
||||
// default-deny policy takes — no allow rules, only isolation.
|
||||
for k, c := range byKey {
|
||||
if c.hostIface != "" {
|
||||
continue
|
||||
}
|
||||
if lp, ok := out.Pods[k.podKey]; ok {
|
||||
c.hostIface = lp.HostIface
|
||||
c.podIPs = append([]net.IP(nil), lp.IPs...)
|
||||
continue
|
||||
}
|
||||
// Last resort: lift from any rule sharing the PodKey. Should
|
||||
// not normally happen — the translator populates Pods for every
|
||||
// isolated pod — but defends against partially-populated Output
|
||||
// values constructed by tests.
|
||||
for _, r := range out.Rules {
|
||||
if r.PodKey == k.podKey {
|
||||
c.hostIface = r.HostIface
|
||||
c.podIPs = append([]net.IP(nil), r.PodIPs...)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Materialise chain names and emit in deterministic order.
|
||||
var chains []chain
|
||||
for k, c := range byKey {
|
||||
if c.hostIface == "" {
|
||||
continue // can't jump to it; skip
|
||||
}
|
||||
c.name = chainName(k.podKey, c.direction)
|
||||
chains = append(chains, *c)
|
||||
}
|
||||
sort.Slice(chains, func(i, j int) bool { return chains[i].name < chains[j].name })
|
||||
return chains
|
||||
}
|
||||
|
||||
// chainName produces a stable, name-safe chain identifier. Pod keys can
|
||||
// contain characters nft doesn't allow in identifiers, so we hash them.
|
||||
// Direction keeps ingress and egress separate.
|
||||
func chainName(podKey string, dir Direction) string {
|
||||
h := fnv.New64a()
|
||||
_, _ = h.Write([]byte(podKey))
|
||||
return fmt.Sprintf("pod_%016x_%s", h.Sum64(), dir)
|
||||
}
|
||||
|
||||
// writeChain emits the chain definition. Empty chains exist deliberately:
|
||||
// the chain's drop policy IS the default-deny.
|
||||
func writeChain(sb *strings.Builder, c chain) {
|
||||
fmt.Fprintf(sb, "\tchain %s {\n", c.name)
|
||||
for _, r := range c.rules {
|
||||
writeAllowRule(sb, r)
|
||||
}
|
||||
if c.policy == "drop" {
|
||||
sb.WriteString("\t\tdrop\n")
|
||||
}
|
||||
sb.WriteString("\t}\n")
|
||||
}
|
||||
|
||||
// writeAllowRule emits one accept line:
|
||||
//
|
||||
// [ip|ip6 saddr {peers}] [ip|ip6 saddr != {except}] [proto dport {port|port-end}] accept
|
||||
//
|
||||
// The saddr / daddr field flips based on direction (ingress = from peer →
|
||||
// match saddr; egress = to peer → match daddr).
|
||||
func writeAllowRule(sb *strings.Builder, r Rule) {
|
||||
v6Peers, v4Peers := splitFamily(r.PeerCIDRs)
|
||||
v6Except, v4Except := splitFamily(r.PeerExcept)
|
||||
v6Pod, v4Pod := splitIPFamily(r.PodIPs)
|
||||
hasPeerFilter := len(r.PeerCIDRs) > 0
|
||||
|
||||
emit := func(family string, peers, except []*net.IPNet, podIP net.IP) {
|
||||
if hasPeerFilter && len(peers) == 0 && len(except) == 0 {
|
||||
// Peer filter exists but no entries of this family — rule
|
||||
// must not match anything for this family.
|
||||
return
|
||||
}
|
||||
if podIP == nil {
|
||||
// Pod has no address of this family; nothing to guard.
|
||||
return
|
||||
}
|
||||
for _, port := range r.Ports {
|
||||
sb.WriteString("\t\t")
|
||||
// Peer (saddr/daddr) match: address is "peer's address",
|
||||
// which is saddr on ingress and daddr on egress.
|
||||
peerField := peerAddrField(family, r.Direction)
|
||||
if hasPeerFilter && len(peers) > 0 {
|
||||
fmt.Fprintf(sb, "%s { %s } ", peerField, joinCIDRs(peers))
|
||||
}
|
||||
if hasPeerFilter && len(except) > 0 {
|
||||
fmt.Fprintf(sb, "%s != { %s } ", peerField, joinCIDRs(except))
|
||||
}
|
||||
// Port match.
|
||||
writePortMatch(sb, port)
|
||||
fmt.Fprintf(sb, "%s\n", r.Action)
|
||||
}
|
||||
}
|
||||
emit("ip6", v6Peers, v6Except, v6Pod)
|
||||
emit("ip", v4Peers, v4Except, v4Pod)
|
||||
}
|
||||
|
||||
// peerAddrField returns "ip6 saddr" / "ip saddr" / "ip6 daddr" / "ip daddr"
|
||||
// depending on family + direction. Ingress matches the peer as the source;
|
||||
// egress matches the peer as the destination.
|
||||
func peerAddrField(family string, dir Direction) string {
|
||||
switch {
|
||||
case dir == DirIngress:
|
||||
return family + " saddr"
|
||||
default:
|
||||
return family + " daddr"
|
||||
}
|
||||
}
|
||||
|
||||
// writePortMatch appends "tcp dport 80 " (single port) or
|
||||
// "tcp dport 8000-8999 " (range), or nothing when port is "any".
|
||||
func writePortMatch(sb *strings.Builder, p PortMatch) {
|
||||
if p.Port == 0 && p.Protocol == "" {
|
||||
return
|
||||
}
|
||||
proto := p.Protocol
|
||||
if proto == "" {
|
||||
proto = "tcp"
|
||||
}
|
||||
if p.Port == 0 {
|
||||
// Protocol-only match. nft has `meta l4proto tcp`.
|
||||
fmt.Fprintf(sb, "meta l4proto %s ", proto)
|
||||
return
|
||||
}
|
||||
if p.EndPort > p.Port {
|
||||
fmt.Fprintf(sb, "%s dport %d-%d ", proto, p.Port, p.EndPort)
|
||||
return
|
||||
}
|
||||
fmt.Fprintf(sb, "%s dport %d ", proto, p.Port)
|
||||
}
|
||||
|
||||
// writeBaseJump emits one line per (pod, direction) chain in the base
|
||||
// `forward` chain. The match is anchored on the host-side veth name so
|
||||
// the rule only fires for traffic that genuinely crosses this pod's veth.
|
||||
//
|
||||
// We additionally constrain on the pod's address (saddr for egress, daddr
|
||||
// for ingress) so a packet that somehow hits the wrong veth — e.g. during
|
||||
// a CNI ADD race — won't be policy-evaluated against the wrong pod.
|
||||
func writeBaseJump(sb *strings.Builder, c chain) {
|
||||
v6, v4 := splitIPFamily(c.podIPs)
|
||||
emit := func(family string, ip net.IP) {
|
||||
if ip == nil {
|
||||
return
|
||||
}
|
||||
var iface, addrField, addrStr string
|
||||
if c.direction == DirEgress {
|
||||
iface = "iifname"
|
||||
addrField = family + " saddr"
|
||||
} else {
|
||||
iface = "oifname"
|
||||
addrField = family + " daddr"
|
||||
}
|
||||
if family == "ip" {
|
||||
addrStr = ip.To4().String()
|
||||
} else {
|
||||
addrStr = ip.To16().String()
|
||||
}
|
||||
fmt.Fprintf(sb, "\t\t%s \"%s\" %s %s jump %s\n", iface, c.hostIface, addrField, addrStr, c.name)
|
||||
}
|
||||
emit("ip6", v6)
|
||||
emit("ip", v4)
|
||||
}
|
||||
|
||||
// splitFamily partitions CIDRs into (v6, v4) lists, preserving order
|
||||
// within each family.
|
||||
func splitFamily(cs []*net.IPNet) ([]*net.IPNet, []*net.IPNet) {
|
||||
var v6, v4 []*net.IPNet
|
||||
for _, c := range cs {
|
||||
if c.IP.To4() != nil {
|
||||
v4 = append(v4, c)
|
||||
} else {
|
||||
v6 = append(v6, c)
|
||||
}
|
||||
}
|
||||
return v6, v4
|
||||
}
|
||||
|
||||
// splitIPFamily picks one v6 and one v4 from a list of pod IPs (a pod has
|
||||
// at most one of each in flock's model).
|
||||
func splitIPFamily(ips []net.IP) (v6, v4 net.IP) {
|
||||
for _, ip := range ips {
|
||||
if ip == nil {
|
||||
continue
|
||||
}
|
||||
if ip.To4() != nil {
|
||||
if v4 == nil {
|
||||
v4 = ip
|
||||
}
|
||||
} else {
|
||||
if v6 == nil {
|
||||
v6 = ip
|
||||
}
|
||||
}
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func joinCIDRs(cs []*net.IPNet) string {
|
||||
parts := make([]string, len(cs))
|
||||
for i, c := range cs {
|
||||
parts[i] = c.String()
|
||||
}
|
||||
sort.Strings(parts)
|
||||
return strings.Join(parts, ", ")
|
||||
}
|
||||
@@ -0,0 +1,219 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"net"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestRender_DefaultDeny — an isolated direction with no rules renders
|
||||
// to a chain whose last action is "drop".
|
||||
func TestRender_DefaultDeny(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
},
|
||||
Rules: []Rule{
|
||||
// Need at least one rule to give the chain its HostIface +
|
||||
// PodIPs. Use an empty rule that selects the same chain.
|
||||
{PodKey: "ns/web", HostIface: "flock00000001", PodIPs: []net.IP{mustIP("2001:db8::1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
Ports: []PortMatch{{}}},
|
||||
},
|
||||
}
|
||||
got := Render(out)
|
||||
if !strings.Contains(got, "table inet flock_netpol") {
|
||||
t.Fatalf("missing table:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, "type filter hook forward") {
|
||||
t.Fatalf("missing base chain:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, "drop") {
|
||||
t.Fatalf("expected default-deny drop in chain:\n%s", got)
|
||||
}
|
||||
// Pod chain name must be deterministic-looking (pod_<hex>_ingress).
|
||||
if !strings.Contains(got, "_ingress {") {
|
||||
t.Fatalf("missing pod ingress chain:\n%s", got)
|
||||
}
|
||||
// Base chain jump anchored on veth + pod IP.
|
||||
if !strings.Contains(got, `oifname "flock00000001"`) {
|
||||
t.Fatalf("missing veth match in base chain:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, "ip6 daddr 2001:db8::1") {
|
||||
t.Fatalf("missing pod IP match in base chain:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_DualStack — pod with both v6 + v4 IPs gets two base-chain
|
||||
// jumps.
|
||||
func TestRender_DualStack(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
},
|
||||
Rules: []Rule{{
|
||||
PodKey: "ns/web", HostIface: "f1",
|
||||
PodIPs: []net.IP{mustIP("2001:db8::1"), mustIP("10.0.0.1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
Ports: []PortMatch{{Protocol: "tcp", Port: 80}},
|
||||
}},
|
||||
}
|
||||
got := Render(out)
|
||||
if !strings.Contains(got, "ip6 daddr 2001:db8::1") {
|
||||
t.Fatalf("missing v6 jump:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, "ip daddr 10.0.0.1") {
|
||||
t.Fatalf("missing v4 jump:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_PortAndPeer — a Rule with peer + port emits a syntactically
|
||||
// well-formed allow line.
|
||||
func TestRender_PortAndPeer(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
},
|
||||
Rules: []Rule{{
|
||||
PodKey: "ns/web", HostIface: "f1",
|
||||
PodIPs: []net.IP{mustIP("2001:db8::1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::a/128")},
|
||||
Ports: []PortMatch{{Protocol: "tcp", Port: 80}},
|
||||
}},
|
||||
}
|
||||
got := Render(out)
|
||||
if !strings.Contains(got, "ip6 saddr { 2001:db8::a/128 } tcp dport 80 accept") {
|
||||
t.Fatalf("expected ingress allow with v6 peer + tcp/80:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_PortRange — endPort renders as "8000-8999".
|
||||
func TestRender_PortRange(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
},
|
||||
Rules: []Rule{{
|
||||
PodKey: "ns/web", HostIface: "f1",
|
||||
PodIPs: []net.IP{mustIP("2001:db8::1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
PeerCIDRs: []*net.IPNet{mustNet("0.0.0.0/0"), mustNet("::/0")},
|
||||
Ports: []PortMatch{{Protocol: "tcp", Port: 8000, EndPort: 8999}},
|
||||
}},
|
||||
}
|
||||
got := Render(out)
|
||||
if !strings.Contains(got, "tcp dport 8000-8999") {
|
||||
t.Fatalf("expected port range:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_IPBlockExcept — except produces a "saddr != { … }" guard.
|
||||
func TestRender_IPBlockExcept(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
},
|
||||
Rules: []Rule{{
|
||||
PodKey: "ns/web", HostIface: "f1",
|
||||
PodIPs: []net.IP{mustIP("10.0.0.1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
PeerCIDRs: []*net.IPNet{mustNet("10.0.0.0/8")},
|
||||
PeerExcept: []*net.IPNet{mustNet("10.99.0.0/16")},
|
||||
Ports: []PortMatch{{}},
|
||||
}},
|
||||
}
|
||||
got := Render(out)
|
||||
if !strings.Contains(got, "ip saddr { 10.0.0.0/8 }") {
|
||||
t.Fatalf("expected ipBlock cidr:\n%s", got)
|
||||
}
|
||||
if !strings.Contains(got, "ip saddr != { 10.99.0.0/16 }") {
|
||||
t.Fatalf("expected ipBlock except:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_AllowAllPeers — empty PeerCIDRs/PeerExcept means "any peer";
|
||||
// the rule should emit an unconditional accept (modulo port).
|
||||
func TestRender_AllowAllPeers(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
},
|
||||
Rules: []Rule{{
|
||||
PodKey: "ns/web", HostIface: "f1",
|
||||
PodIPs: []net.IP{mustIP("2001:db8::1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
Ports: []PortMatch{{Protocol: "tcp", Port: 443}},
|
||||
}},
|
||||
}
|
||||
got := Render(out)
|
||||
if !strings.Contains(got, "tcp dport 443 accept") {
|
||||
t.Fatalf("expected unconditional tcp/443 allow:\n%s", got)
|
||||
}
|
||||
// Should NOT have a saddr/daddr filter (empty peers).
|
||||
if strings.Contains(got, "ip6 saddr {") || strings.Contains(got, "ip saddr {") {
|
||||
t.Fatalf("expected no peer filter:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_Determinism — same input → byte-identical output.
|
||||
func TestRender_Determinism(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirIngress}: {},
|
||||
{PodKey: "ns/db", Direction: DirEgress}: {},
|
||||
},
|
||||
Rules: []Rule{
|
||||
{PodKey: "ns/web", HostIface: "f1", PodIPs: []net.IP{mustIP("2001:db8::1")},
|
||||
Direction: DirIngress, Action: ActionAccept,
|
||||
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::5/128"), mustNet("2001:db8::3/128")},
|
||||
Ports: []PortMatch{{Protocol: "tcp", Port: 80}}},
|
||||
{PodKey: "ns/db", HostIface: "f2", PodIPs: []net.IP{mustIP("2001:db8::2")},
|
||||
Direction: DirEgress, Action: ActionAccept,
|
||||
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::aa/128")},
|
||||
Ports: []PortMatch{{}}},
|
||||
},
|
||||
}
|
||||
a := Render(out)
|
||||
b := Render(out)
|
||||
if a != b {
|
||||
t.Fatalf("Render not deterministic:\nA=\n%s\nB=\n%s", a, b)
|
||||
}
|
||||
// And peers in the rule must be sorted (we deliberately gave 5 then 3).
|
||||
if strings.Index(a, "2001:db8::3/128") > strings.Index(a, "2001:db8::5/128") {
|
||||
t.Fatalf("peer CIDRs not sorted within rule:\n%s", a)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_EgressDirection — egress rules use iifname + saddr (pod-side).
|
||||
func TestRender_EgressDirection(t *testing.T) {
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{
|
||||
{PodKey: "ns/web", Direction: DirEgress}: {},
|
||||
},
|
||||
Rules: []Rule{{
|
||||
PodKey: "ns/web", HostIface: "f1",
|
||||
PodIPs: []net.IP{mustIP("2001:db8::1")},
|
||||
Direction: DirEgress, Action: ActionAccept,
|
||||
PeerCIDRs: []*net.IPNet{mustNet("2001:db8::aa/128")},
|
||||
Ports: []PortMatch{{Protocol: "tcp", Port: 53}},
|
||||
}},
|
||||
}
|
||||
got := Render(out)
|
||||
// Base-chain jump for egress matches iifname + ip6 saddr (pod's IP).
|
||||
if !strings.Contains(got, `iifname "f1" ip6 saddr 2001:db8::1`) {
|
||||
t.Fatalf("missing egress base-chain jump:\n%s", got)
|
||||
}
|
||||
// Peer filter for egress matches the *destination* (the peer is downstream).
|
||||
if !strings.Contains(got, "ip6 daddr { 2001:db8::aa/128 }") {
|
||||
t.Fatalf("expected daddr peer filter for egress:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
func mustNet(s string) *net.IPNet {
|
||||
_, n, err := net.ParseCIDR(s)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
return n
|
||||
}
|
||||
@@ -0,0 +1,443 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"sort"
|
||||
|
||||
netv1 "k8s.io/api/networking/v1"
|
||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
"k8s.io/apimachinery/pkg/labels"
|
||||
)
|
||||
|
||||
// Inputs is the world-view the translator consumes. All fields are owned
|
||||
// by the caller; the translator does not mutate them.
|
||||
type Inputs struct {
|
||||
// LocalPods are the pods scheduled on this node that have a committed
|
||||
// flock allocation. Only these pods get rules — peers may live
|
||||
// elsewhere.
|
||||
LocalPods []Pod
|
||||
|
||||
// PeerPods is the cluster-wide pod set used to resolve podSelector +
|
||||
// namespaceSelector peers. It is fine to include the local pods here
|
||||
// too; duplicates are deduped by (namespace, name).
|
||||
PeerPods []PeerPod
|
||||
|
||||
// Namespaces is the cluster's full Namespace set. Used for
|
||||
// namespaceSelector matching.
|
||||
Namespaces []Namespace
|
||||
|
||||
// Policies is every NetworkPolicy in the cluster. The translator
|
||||
// filters down to those that select at least one local pod.
|
||||
Policies []netv1.NetworkPolicy
|
||||
}
|
||||
|
||||
// Output is the result of one translation pass.
|
||||
type Output struct {
|
||||
// Rules is the flat ordered list of allow rules to render. The
|
||||
// renderer groups them by (PodKey, Direction) into chains.
|
||||
Rules []Rule
|
||||
|
||||
// Isolated is the set of (PodKey, Direction) pairs whose chain must
|
||||
// have a default-deny policy. A pod selected by at least one policy
|
||||
// in a given direction shows up here. The renderer uses this to
|
||||
// decide whether to emit a chain at all and what its base policy is.
|
||||
Isolated map[Isolation]struct{}
|
||||
|
||||
// Pods carries the HostIface + IPs for every local pod referenced
|
||||
// by the policy world, including pods that produced only isolation
|
||||
// (default-deny) without any allow rules. The renderer needs this
|
||||
// because such a pod has no Rule to lift the HostIface from.
|
||||
Pods map[string]LocalPod // key = namespace/name
|
||||
}
|
||||
|
||||
// Isolation is the (PodKey, Direction) key of the Isolated map.
|
||||
type Isolation struct {
|
||||
PodKey string
|
||||
Direction Direction
|
||||
}
|
||||
|
||||
// Translate runs the translation pass. It is a pure function: same Inputs
|
||||
// always produces semantically equal Output. (Order of slices is stable
|
||||
// but Rules within a chain follow the order in which selecting policies
|
||||
// appear, which is itself sorted; see canonicalisePolicies.)
|
||||
//
|
||||
// Errors are returned only for unrecoverable malformed input; per-rule
|
||||
// translation errors are logged via warn and skipped so that a single
|
||||
// broken policy can't take down enforcement for a whole node. The optional
|
||||
// warn callback is invoked for each skipped sub-rule with a human-readable
|
||||
// message. Pass nil to silently drop.
|
||||
func Translate(in Inputs, warn func(string)) (Output, error) {
|
||||
if warn == nil {
|
||||
warn = func(string) {}
|
||||
}
|
||||
|
||||
out := Output{
|
||||
Isolated: map[Isolation]struct{}{},
|
||||
Pods: map[string]LocalPod{},
|
||||
}
|
||||
policies := canonicalisePolicies(in.Policies)
|
||||
nsByName := indexNamespaces(in.Namespaces)
|
||||
peerPodsByNS := indexPeerPods(in.PeerPods)
|
||||
|
||||
for _, pod := range in.LocalPods {
|
||||
if len(pod.IPs) == 0 {
|
||||
continue // no allocation yet; translator skips
|
||||
}
|
||||
key := pod.Namespace + "/" + pod.Name
|
||||
|
||||
// Find every policy in pod.Namespace whose podSelector matches.
|
||||
// Cross-namespace policies do not select pods outside their own
|
||||
// namespace; that's how the NetworkPolicy spec defines it.
|
||||
for _, p := range policies {
|
||||
if p.Namespace != pod.Namespace {
|
||||
continue
|
||||
}
|
||||
sel, err := metav1.LabelSelectorAsSelector(&p.Spec.PodSelector)
|
||||
if err != nil {
|
||||
warn(fmt.Sprintf("policy %s/%s: invalid podSelector: %v", p.Namespace, p.Name, err))
|
||||
continue
|
||||
}
|
||||
if !sel.Matches(labels.Set(pod.Labels)) {
|
||||
continue
|
||||
}
|
||||
|
||||
ingress, egress := policyDirections(&p)
|
||||
if ingress || egress {
|
||||
out.Pods[key] = LocalPod{
|
||||
PodKey: key,
|
||||
HostIface: pod.HostIface,
|
||||
IPs: append([]net.IP(nil), pod.IPs...),
|
||||
}
|
||||
}
|
||||
if ingress {
|
||||
out.Isolated[Isolation{PodKey: key, Direction: DirIngress}] = struct{}{}
|
||||
}
|
||||
if egress {
|
||||
out.Isolated[Isolation{PodKey: key, Direction: DirEgress}] = struct{}{}
|
||||
}
|
||||
|
||||
// Translate ingress rules.
|
||||
if ingress {
|
||||
for ri, r := range p.Spec.Ingress {
|
||||
rules, err := buildIngressRules(pod, r, p.Namespace, nsByName, peerPodsByNS)
|
||||
if err != nil {
|
||||
warn(fmt.Sprintf("policy %s/%s ingress[%d]: %v", p.Namespace, p.Name, ri, err))
|
||||
continue
|
||||
}
|
||||
out.Rules = append(out.Rules, rules...)
|
||||
}
|
||||
}
|
||||
// Translate egress rules.
|
||||
if egress {
|
||||
for ri, r := range p.Spec.Egress {
|
||||
rules, err := buildEgressRules(pod, r, p.Namespace, nsByName, peerPodsByNS)
|
||||
if err != nil {
|
||||
warn(fmt.Sprintf("policy %s/%s egress[%d]: %v", p.Namespace, p.Name, ri, err))
|
||||
continue
|
||||
}
|
||||
out.Rules = append(out.Rules, rules...)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// policyDirections reports which directions a NetworkPolicy isolates.
|
||||
//
|
||||
// Per the spec, the PolicyTypes field is the source of truth when set;
|
||||
// when omitted, isolation is inferred from which rule lists are populated
|
||||
// (Ingress always; Egress only if Spec.Egress is non-empty).
|
||||
func policyDirections(p *netv1.NetworkPolicy) (ingress, egress bool) {
|
||||
if len(p.Spec.PolicyTypes) > 0 {
|
||||
for _, t := range p.Spec.PolicyTypes {
|
||||
switch t {
|
||||
case netv1.PolicyTypeIngress:
|
||||
ingress = true
|
||||
case netv1.PolicyTypeEgress:
|
||||
egress = true
|
||||
}
|
||||
}
|
||||
return
|
||||
}
|
||||
ingress = true
|
||||
egress = len(p.Spec.Egress) > 0
|
||||
return
|
||||
}
|
||||
|
||||
// buildIngressRules expands one NetworkPolicyIngressRule into Rule(s).
|
||||
// One Rule per allowed peer-set; each Rule carries the full Ports filter
|
||||
// from the source rule.
|
||||
func buildIngressRules(
|
||||
pod Pod,
|
||||
r netv1.NetworkPolicyIngressRule,
|
||||
policyNS string,
|
||||
nsByName map[string]Namespace,
|
||||
peerPodsByNS map[string][]PeerPod,
|
||||
) ([]Rule, error) {
|
||||
ports, err := translatePorts(r.Ports)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
peers, err := translatePeers(r.From, policyNS, nsByName, peerPodsByNS)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return assembleRules(pod, DirIngress, peers, ports), nil
|
||||
}
|
||||
|
||||
// buildEgressRules is the egress mirror of buildIngressRules.
|
||||
func buildEgressRules(
|
||||
pod Pod,
|
||||
r netv1.NetworkPolicyEgressRule,
|
||||
policyNS string,
|
||||
nsByName map[string]Namespace,
|
||||
peerPodsByNS map[string][]PeerPod,
|
||||
) ([]Rule, error) {
|
||||
ports, err := translatePorts(r.Ports)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
peers, err := translatePeers(r.To, policyNS, nsByName, peerPodsByNS)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return assembleRules(pod, DirEgress, peers, ports), nil
|
||||
}
|
||||
|
||||
// peerSet is the resolved peer information for one rule's From / To list.
|
||||
type peerSet struct {
|
||||
// allowAll is true when the rule has no peers at all (an empty From /
|
||||
// To list, which the spec defines as "from anywhere"). It overrides
|
||||
// CIDRs and Except.
|
||||
allowAll bool
|
||||
// CIDRs is the union of every IP / CIDR contributed by the rule's
|
||||
// peer entries (resolved Pod IPs, namespace pods, and ipBlock.cidr).
|
||||
CIDRs []*net.IPNet
|
||||
// Except is the union of every ipBlock.except entry across the rule.
|
||||
Except []*net.IPNet
|
||||
}
|
||||
|
||||
// translatePeers resolves a list of NetworkPolicyPeer entries into a
|
||||
// peerSet. Each peer entry contributes either CIDRs (resolved from
|
||||
// pod / namespace selectors, or copied from ipBlock) or Except entries.
|
||||
func translatePeers(
|
||||
peers []netv1.NetworkPolicyPeer,
|
||||
policyNS string,
|
||||
nsByName map[string]Namespace,
|
||||
peerPodsByNS map[string][]PeerPod,
|
||||
) (peerSet, error) {
|
||||
if len(peers) == 0 {
|
||||
return peerSet{allowAll: true}, nil
|
||||
}
|
||||
out := peerSet{}
|
||||
for i, p := range peers {
|
||||
switch {
|
||||
case p.IPBlock != nil:
|
||||
_, cidr, err := net.ParseCIDR(p.IPBlock.CIDR)
|
||||
if err != nil {
|
||||
return peerSet{}, fmt.Errorf("peer[%d] ipBlock.cidr %q: %w", i, p.IPBlock.CIDR, err)
|
||||
}
|
||||
out.CIDRs = append(out.CIDRs, cidr)
|
||||
for j, ex := range p.IPBlock.Except {
|
||||
_, exNet, err := net.ParseCIDR(ex)
|
||||
if err != nil {
|
||||
return peerSet{}, fmt.Errorf("peer[%d] ipBlock.except[%d] %q: %w", i, j, ex, err)
|
||||
}
|
||||
out.Except = append(out.Except, exNet)
|
||||
}
|
||||
case p.PodSelector != nil || p.NamespaceSelector != nil:
|
||||
ips, err := resolvePodNamespacePeer(p, policyNS, nsByName, peerPodsByNS)
|
||||
if err != nil {
|
||||
return peerSet{}, fmt.Errorf("peer[%d]: %w", i, err)
|
||||
}
|
||||
out.CIDRs = append(out.CIDRs, ips...)
|
||||
default:
|
||||
return peerSet{}, fmt.Errorf("peer[%d] is empty (must set ipBlock, podSelector, or namespaceSelector)", i)
|
||||
}
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// resolvePodNamespacePeer walks the cluster's peer-pod set and returns
|
||||
// /128 (v6) and /32 (v4) CIDRs for each pod that matches the (possibly
|
||||
// combined) pod + namespace selectors.
|
||||
//
|
||||
// Selector semantics from the NetworkPolicy spec:
|
||||
//
|
||||
// - podSelector + namespaceSelector both nil → handled upstream.
|
||||
// - podSelector set, namespaceSelector nil → match in the policy's
|
||||
// own namespace.
|
||||
// - podSelector nil, namespaceSelector set → match every pod in
|
||||
// namespaces that match the namespaceSelector.
|
||||
// - both set → AND: pod must be in a matching namespace AND match
|
||||
// the podSelector.
|
||||
//
|
||||
// An empty (non-nil) selector matches everything in scope.
|
||||
func resolvePodNamespacePeer(
|
||||
p netv1.NetworkPolicyPeer,
|
||||
policyNS string,
|
||||
nsByName map[string]Namespace,
|
||||
peerPodsByNS map[string][]PeerPod,
|
||||
) ([]*net.IPNet, error) {
|
||||
var podSel, nsSel labels.Selector
|
||||
if p.PodSelector != nil {
|
||||
s, err := metav1.LabelSelectorAsSelector(p.PodSelector)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("podSelector: %w", err)
|
||||
}
|
||||
podSel = s
|
||||
}
|
||||
if p.NamespaceSelector != nil {
|
||||
s, err := metav1.LabelSelectorAsSelector(p.NamespaceSelector)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("namespaceSelector: %w", err)
|
||||
}
|
||||
nsSel = s
|
||||
}
|
||||
|
||||
// Decide which namespaces are in scope.
|
||||
var inScope []string
|
||||
if nsSel == nil {
|
||||
// Pod-only selector → just the policy's own namespace.
|
||||
inScope = []string{policyNS}
|
||||
} else {
|
||||
for name, ns := range nsByName {
|
||||
if nsSel.Matches(labels.Set(ns.Labels)) {
|
||||
inScope = append(inScope, name)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var out []*net.IPNet
|
||||
for _, ns := range inScope {
|
||||
for _, pp := range peerPodsByNS[ns] {
|
||||
if podSel != nil && !podSel.Matches(labels.Set(pp.Labels)) {
|
||||
continue
|
||||
}
|
||||
for _, ip := range pp.IPs {
|
||||
out = append(out, ipToHostCIDR(ip))
|
||||
}
|
||||
}
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// translatePorts converts NetworkPolicyPort entries into PortMatch.
|
||||
//
|
||||
// A nil/empty Ports list on a NetworkPolicy rule means "all ports" by
|
||||
// spec; we represent that as a single zero-valued PortMatch (any proto,
|
||||
// any port) so the renderer can emit a single rule rather than a chain
|
||||
// of port-equality matches.
|
||||
func translatePorts(ports []netv1.NetworkPolicyPort) ([]PortMatch, error) {
|
||||
if len(ports) == 0 {
|
||||
return []PortMatch{{}}, nil
|
||||
}
|
||||
var out []PortMatch
|
||||
for i, p := range ports {
|
||||
var protoStr string
|
||||
if p.Protocol != nil {
|
||||
switch *p.Protocol {
|
||||
case "TCP":
|
||||
protoStr = "tcp"
|
||||
case "UDP":
|
||||
protoStr = "udp"
|
||||
case "SCTP":
|
||||
protoStr = "sctp"
|
||||
default:
|
||||
return nil, fmt.Errorf("port[%d]: protocol %q not supported", i, *p.Protocol)
|
||||
}
|
||||
} else {
|
||||
// Spec default: TCP. We use empty string to mean "any of
|
||||
// the three" only when the user explicitly sets neither
|
||||
// protocol nor port; here the user has supplied a Port,
|
||||
// which implies a protocol — and the spec default is TCP.
|
||||
protoStr = "tcp"
|
||||
}
|
||||
var port, endPort int
|
||||
if p.Port != nil {
|
||||
if p.Port.Type != 0 { // intstr.Int = 0; intstr.String = 1
|
||||
return nil, fmt.Errorf("port[%d]: named ports are not yet supported", i)
|
||||
}
|
||||
port = int(p.Port.IntVal)
|
||||
}
|
||||
if p.EndPort != nil {
|
||||
endPort = int(*p.EndPort)
|
||||
if endPort < port {
|
||||
return nil, fmt.Errorf("port[%d]: endPort %d < port %d", i, endPort, port)
|
||||
}
|
||||
}
|
||||
out = append(out, PortMatch{Protocol: protoStr, Port: port, EndPort: endPort})
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// assembleRules emits the cross-product of (one peer-set) × (port list).
|
||||
// We currently emit a single Rule per direction since the peer-set is the
|
||||
// expensive shared field; ports go inline. allowAll peers result in a
|
||||
// rule with no PeerCIDRs, which the renderer treats as "any source".
|
||||
func assembleRules(pod Pod, dir Direction, peers peerSet, ports []PortMatch) []Rule {
|
||||
if !peers.allowAll && len(peers.CIDRs) == 0 {
|
||||
// Selector matched no peers (e.g. podSelector for a label that
|
||||
// no live pod has). Emit nothing — the rule cannot allow any
|
||||
// real traffic. The pod stays in default-deny for this rule.
|
||||
return nil
|
||||
}
|
||||
r := Rule{
|
||||
PodKey: pod.Namespace + "/" + pod.Name,
|
||||
HostIface: pod.HostIface,
|
||||
PodIPs: append([]net.IP(nil), pod.IPs...),
|
||||
Direction: dir,
|
||||
Action: ActionAccept,
|
||||
Ports: append([]PortMatch(nil), ports...),
|
||||
}
|
||||
if !peers.allowAll {
|
||||
r.PeerCIDRs = append([]*net.IPNet(nil), peers.CIDRs...)
|
||||
r.PeerExcept = append([]*net.IPNet(nil), peers.Except...)
|
||||
}
|
||||
return []Rule{r}
|
||||
}
|
||||
|
||||
// canonicalisePolicies sorts the policy slice by (namespace, name) so the
|
||||
// translator's output is deterministic regardless of informer event order.
|
||||
func canonicalisePolicies(p []netv1.NetworkPolicy) []netv1.NetworkPolicy {
|
||||
out := append([]netv1.NetworkPolicy(nil), p...)
|
||||
sort.Slice(out, func(i, j int) bool {
|
||||
if out[i].Namespace != out[j].Namespace {
|
||||
return out[i].Namespace < out[j].Namespace
|
||||
}
|
||||
return out[i].Name < out[j].Name
|
||||
})
|
||||
return out
|
||||
}
|
||||
|
||||
func indexNamespaces(nss []Namespace) map[string]Namespace {
|
||||
out := make(map[string]Namespace, len(nss))
|
||||
for _, ns := range nss {
|
||||
out[ns.Name] = ns
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func indexPeerPods(pods []PeerPod) map[string][]PeerPod {
|
||||
out := map[string][]PeerPod{}
|
||||
for _, p := range pods {
|
||||
out[p.Namespace] = append(out[p.Namespace], p)
|
||||
}
|
||||
// Sort each namespace's pod list by (name) so the translator's IP
|
||||
// ordering is stable.
|
||||
for k := range out {
|
||||
sort.Slice(out[k], func(i, j int) bool { return out[k][i].Name < out[k][j].Name })
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// ipToHostCIDR returns ip/32 (v4) or ip/128 (v6) — the smallest CIDR
|
||||
// covering exactly that one address.
|
||||
func ipToHostCIDR(ip net.IP) *net.IPNet {
|
||||
if v4 := ip.To4(); v4 != nil {
|
||||
return &net.IPNet{IP: v4, Mask: net.CIDRMask(32, 32)}
|
||||
}
|
||||
return &net.IPNet{IP: ip.To16(), Mask: net.CIDRMask(128, 128)}
|
||||
}
|
||||
@@ -0,0 +1,147 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"net"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
corev1 "k8s.io/api/core/v1"
|
||||
netv1 "k8s.io/api/networking/v1"
|
||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
"k8s.io/apimachinery/pkg/util/intstr"
|
||||
)
|
||||
|
||||
// FuzzTranslate_AndRender stitches the Translator and Renderer together
|
||||
// against synthetic NetworkPolicies built from fuzzed bytes. We are not
|
||||
// trying to produce *valid* policies — the goal is to confirm that:
|
||||
//
|
||||
// 1. Neither stage panics on weird input.
|
||||
// 2. Render output is balanced (every "{" has a matching "}").
|
||||
// 3. Rendering twice is byte-stable.
|
||||
// 4. The Pods set in Output is consistent with Isolated (every isolated
|
||||
// PodKey has a matching entry in Pods).
|
||||
//
|
||||
// The translator's warn callback is captured to ensure it never panics
|
||||
// with unexpected message types either.
|
||||
func FuzzTranslate_AndRender(f *testing.F) {
|
||||
type seed struct {
|
||||
policyNS, policyName string
|
||||
podSelectorKey, podSelValue string
|
||||
peerSelectorKey, peerSelV string
|
||||
peerNS, peerName, peerIP string
|
||||
port uint16
|
||||
ipBlockCIDR, ipBlockExcept string
|
||||
}
|
||||
for _, s := range []seed{
|
||||
{policyNS: "ns1", policyName: "p1", podSelectorKey: "app", podSelValue: "web", port: 80},
|
||||
{policyNS: "ns1", policyName: "p1", peerSelectorKey: "app", peerSelV: "client", peerNS: "ns1", peerName: "c1", peerIP: "2001:db8::aa", port: 443},
|
||||
{policyNS: "ns1", policyName: "p1", ipBlockCIDR: "10.0.0.0/8", ipBlockExcept: "10.99.0.0/16", port: 0},
|
||||
{policyNS: "", policyName: ""}, // pathological
|
||||
{policyNS: "ns1", policyName: "p1", podSelectorKey: "app\x00", podSelValue: "web\nnewline"},
|
||||
{policyNS: "ns1", policyName: "p1", port: 65535},
|
||||
{policyNS: "ns1", policyName: "p1", port: 1},
|
||||
} {
|
||||
f.Add(s.policyNS, s.policyName, s.podSelectorKey, s.podSelValue,
|
||||
s.peerSelectorKey, s.peerSelV, s.peerNS, s.peerName, s.peerIP,
|
||||
s.port, s.ipBlockCIDR, s.ipBlockExcept)
|
||||
}
|
||||
|
||||
f.Fuzz(func(t *testing.T,
|
||||
policyNS, policyName,
|
||||
podSelectorKey, podSelValue,
|
||||
peerSelectorKey, peerSelV,
|
||||
peerNS, peerName, peerIP string,
|
||||
port uint16,
|
||||
ipBlockCIDR, ipBlockExcept string,
|
||||
) {
|
||||
// Build a synthetic policy.
|
||||
policy := netv1.NetworkPolicy{
|
||||
ObjectMeta: metav1.ObjectMeta{Namespace: policyNS, Name: policyName},
|
||||
Spec: netv1.NetworkPolicySpec{
|
||||
PolicyTypes: []netv1.PolicyType{netv1.PolicyTypeIngress},
|
||||
},
|
||||
}
|
||||
if podSelectorKey != "" {
|
||||
policy.Spec.PodSelector = metav1.LabelSelector{
|
||||
MatchLabels: map[string]string{podSelectorKey: podSelValue},
|
||||
}
|
||||
} else {
|
||||
policy.Spec.PodSelector = metav1.LabelSelector{}
|
||||
}
|
||||
ingress := netv1.NetworkPolicyIngressRule{}
|
||||
if peerSelectorKey != "" {
|
||||
ingress.From = append(ingress.From, netv1.NetworkPolicyPeer{
|
||||
PodSelector: &metav1.LabelSelector{
|
||||
MatchLabels: map[string]string{peerSelectorKey: peerSelV},
|
||||
},
|
||||
})
|
||||
}
|
||||
if ipBlockCIDR != "" {
|
||||
peer := netv1.NetworkPolicyPeer{
|
||||
IPBlock: &netv1.IPBlock{CIDR: ipBlockCIDR},
|
||||
}
|
||||
if ipBlockExcept != "" {
|
||||
peer.IPBlock.Except = []string{ipBlockExcept}
|
||||
}
|
||||
ingress.From = append(ingress.From, peer)
|
||||
}
|
||||
if port != 0 {
|
||||
tcp := corev1.ProtocolTCP
|
||||
p := intstr.FromInt32(int32(port))
|
||||
ingress.Ports = append(ingress.Ports, netv1.NetworkPolicyPort{
|
||||
Protocol: &tcp, Port: &p,
|
||||
})
|
||||
}
|
||||
policy.Spec.Ingress = append(policy.Spec.Ingress, ingress)
|
||||
|
||||
// Local pod, possibly matching the policy.
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web",
|
||||
Labels: map[string]string{podSelectorKey: podSelValue, "app": "web"},
|
||||
HostIface: "flock00000001",
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
// Peer pod, possibly matching the peer selector.
|
||||
var peers []PeerPod
|
||||
if peerName != "" {
|
||||
peerIPParsed := net.ParseIP(peerIP)
|
||||
if peerIPParsed != nil {
|
||||
peers = append(peers, PeerPod{
|
||||
Namespace: peerNS, Name: peerName,
|
||||
Labels: map[string]string{peerSelectorKey: peerSelV},
|
||||
IPs: []net.IP{peerIPParsed},
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{pod},
|
||||
PeerPods: peers,
|
||||
Namespaces: []Namespace{
|
||||
{Name: "ns1", Labels: map[string]string{"kubernetes.io/metadata.name": "ns1"}},
|
||||
},
|
||||
Policies: []netv1.NetworkPolicy{policy},
|
||||
}, func(string) {})
|
||||
if err != nil {
|
||||
return // any error is acceptable
|
||||
}
|
||||
|
||||
// Property: every isolated PodKey appears in Output.Pods.
|
||||
for iso := range out.Isolated {
|
||||
if _, ok := out.Pods[iso.PodKey]; !ok {
|
||||
t.Fatalf("isolated %s has no Pods entry", iso.PodKey)
|
||||
}
|
||||
}
|
||||
|
||||
script := Render(out)
|
||||
// Property: balanced braces.
|
||||
if got := strings.Count(script, "{") - strings.Count(script, "}"); got != 0 {
|
||||
t.Fatalf("unbalanced braces (%d):\n%s", got, script)
|
||||
}
|
||||
// Property: deterministic (run again, compare).
|
||||
script2 := Render(out)
|
||||
if script != script2 {
|
||||
t.Fatalf("Render not deterministic")
|
||||
}
|
||||
})
|
||||
}
|
||||
@@ -0,0 +1,452 @@
|
||||
package netpol
|
||||
|
||||
import (
|
||||
"net"
|
||||
"testing"
|
||||
|
||||
corev1 "k8s.io/api/core/v1"
|
||||
netv1 "k8s.io/api/networking/v1"
|
||||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
"k8s.io/apimachinery/pkg/util/intstr"
|
||||
)
|
||||
|
||||
func mustIP(s string) net.IP {
|
||||
ip := net.ParseIP(s)
|
||||
if ip == nil {
|
||||
panic("bad IP: " + s)
|
||||
}
|
||||
return ip
|
||||
}
|
||||
|
||||
func newPolicy(ns, name string, mods ...func(*netv1.NetworkPolicy)) netv1.NetworkPolicy {
|
||||
p := netv1.NetworkPolicy{
|
||||
ObjectMeta: metav1.ObjectMeta{Namespace: ns, Name: name},
|
||||
Spec: netv1.NetworkPolicySpec{},
|
||||
}
|
||||
for _, m := range mods {
|
||||
m(&p)
|
||||
}
|
||||
return p
|
||||
}
|
||||
|
||||
func tcpPort(port int) netv1.NetworkPolicyPort {
|
||||
proto := corev1.ProtocolTCP
|
||||
p := intstr.FromInt32(int32(port))
|
||||
return netv1.NetworkPolicyPort{Protocol: &proto, Port: &p}
|
||||
}
|
||||
|
||||
// Pod-only selector that matches everything (`{}`).
|
||||
func emptySelector() *metav1.LabelSelector {
|
||||
return &metav1.LabelSelector{}
|
||||
}
|
||||
|
||||
func selectorMatching(kv map[string]string) *metav1.LabelSelector {
|
||||
return &metav1.LabelSelector{MatchLabels: kv}
|
||||
}
|
||||
|
||||
// Helper: collect Isolated keys for the given pod into a string list.
|
||||
func isolationFor(out Output, podKey string) (in, eg bool) {
|
||||
if _, ok := out.Isolated[Isolation{PodKey: podKey, Direction: DirIngress}]; ok {
|
||||
in = true
|
||||
}
|
||||
if _, ok := out.Isolated[Isolation{PodKey: podKey, Direction: DirEgress}]; ok {
|
||||
eg = true
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
// TestTranslate_NoPolicies — pod with no matching policy is unrestricted.
|
||||
func TestTranslate_NoPolicies(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "p1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
HostIface: "flock00000001",
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
out, err := Translate(Inputs{LocalPods: []Pod{pod}}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(out.Rules) != 0 {
|
||||
t.Fatalf("expected no rules, got %d", len(out.Rules))
|
||||
}
|
||||
in, eg := isolationFor(out, "ns1/p1")
|
||||
if in || eg {
|
||||
t.Fatalf("pod should not be isolated: in=%v eg=%v", in, eg)
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_DefaultDeny — a policy with empty Ingress + PolicyTypes
|
||||
// = [Ingress] selects the pod and isolates it; no allow rules emitted.
|
||||
func TestTranslate_DefaultDenyIngress(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
HostIface: "flock00000001",
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
policy := newPolicy("ns1", "default-deny", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
})
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{pod},
|
||||
Policies: []netv1.NetworkPolicy{policy},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(out.Rules) != 0 {
|
||||
t.Fatalf("expected no rules from a deny-all, got %d", len(out.Rules))
|
||||
}
|
||||
in, eg := isolationFor(out, "ns1/web")
|
||||
if !in {
|
||||
t.Fatalf("ingress should be isolated")
|
||||
}
|
||||
if eg {
|
||||
t.Fatalf("egress should NOT be isolated (policy only set ingress)")
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_DefaultDenyEgress_InferredFromEgressList — when
|
||||
// PolicyTypes is omitted but Spec.Egress is non-empty, egress should
|
||||
// also be isolated by inference.
|
||||
func TestTranslate_DefaultDenyEgress_InferredFromEgressList(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
policy := newPolicy("ns1", "egress-rule", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.Egress = []netv1.NetworkPolicyEgressRule{{}}
|
||||
})
|
||||
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
|
||||
in, eg := isolationFor(out, "ns1/web")
|
||||
if !in || !eg {
|
||||
t.Fatalf("both directions should be isolated: in=%v eg=%v", in, eg)
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_PodSelectorPeer_SameNamespace — peer is a single pod in
|
||||
// the same namespace, identified by label.
|
||||
func TestTranslate_PodSelectorPeer(t *testing.T) {
|
||||
web := Pod{
|
||||
Namespace: "ns1", Name: "web",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
clientIP := mustIP("2001:db8::2")
|
||||
peer := PeerPod{
|
||||
Namespace: "ns1", Name: "client",
|
||||
Labels: map[string]string{"app": "client"},
|
||||
IPs: []net.IP{clientIP},
|
||||
}
|
||||
policy := newPolicy("ns1", "allow-from-client", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *selectorMatching(map[string]string{"app": "web"})
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
From: []netv1.NetworkPolicyPeer{{
|
||||
PodSelector: selectorMatching(map[string]string{"app": "client"}),
|
||||
}},
|
||||
Ports: []netv1.NetworkPolicyPort{tcpPort(80)},
|
||||
}}
|
||||
})
|
||||
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{web},
|
||||
PeerPods: []PeerPod{peer},
|
||||
Policies: []netv1.NetworkPolicy{policy},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(out.Rules) != 1 {
|
||||
t.Fatalf("expected 1 rule, got %d: %+v", len(out.Rules), out.Rules)
|
||||
}
|
||||
r := out.Rules[0]
|
||||
if r.PodKey != "ns1/web" || r.Direction != DirIngress {
|
||||
t.Fatalf("rule has wrong subject: %+v", r)
|
||||
}
|
||||
if len(r.PeerCIDRs) != 1 || !r.PeerCIDRs[0].IP.Equal(clientIP) {
|
||||
t.Fatalf("peer CIDR wrong: %+v", r.PeerCIDRs)
|
||||
}
|
||||
if len(r.Ports) != 1 || r.Ports[0].Protocol != "tcp" || r.Ports[0].Port != 80 {
|
||||
t.Fatalf("port wrong: %+v", r.Ports)
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_NamespaceSelector — peer is "every pod in any namespace
|
||||
// with label tier=trusted".
|
||||
func TestTranslate_NamespaceSelector(t *testing.T) {
|
||||
web := Pod{
|
||||
Namespace: "ns1", Name: "web",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
HostIface: "f1", IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{web},
|
||||
Namespaces: []Namespace{
|
||||
{Name: "ns1", Labels: map[string]string{}},
|
||||
{Name: "trusted-1", Labels: map[string]string{"tier": "trusted"}},
|
||||
{Name: "trusted-2", Labels: map[string]string{"tier": "trusted"}},
|
||||
{Name: "untrusted", Labels: map[string]string{"tier": "wild"}},
|
||||
},
|
||||
PeerPods: []PeerPod{
|
||||
{Namespace: "trusted-1", Name: "a", IPs: []net.IP{mustIP("2001:db8::a")}},
|
||||
{Namespace: "trusted-2", Name: "b", IPs: []net.IP{mustIP("2001:db8::b")}},
|
||||
{Namespace: "untrusted", Name: "x", IPs: []net.IP{mustIP("2001:db8::ff")}},
|
||||
},
|
||||
Policies: []netv1.NetworkPolicy{newPolicy("ns1", "allow-trusted", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
From: []netv1.NetworkPolicyPeer{{
|
||||
NamespaceSelector: selectorMatching(map[string]string{"tier": "trusted"}),
|
||||
}},
|
||||
}}
|
||||
})},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(out.Rules) != 1 {
|
||||
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
|
||||
}
|
||||
got := map[string]bool{}
|
||||
for _, c := range out.Rules[0].PeerCIDRs {
|
||||
got[c.IP.String()] = true
|
||||
}
|
||||
if !got["2001:db8::a"] || !got["2001:db8::b"] {
|
||||
t.Fatalf("trusted pod IPs missing: %v", got)
|
||||
}
|
||||
if got["2001:db8::ff"] {
|
||||
t.Fatalf("untrusted pod IP leaked into rule")
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_IPBlockWithExcept — ipBlock with an except range.
|
||||
func TestTranslate_IPBlockWithExcept(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
IPs: []net.IP{mustIP("10.0.0.1")},
|
||||
}
|
||||
policy := newPolicy("ns1", "ipblock", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
From: []netv1.NetworkPolicyPeer{{
|
||||
IPBlock: &netv1.IPBlock{
|
||||
CIDR: "10.0.0.0/8",
|
||||
Except: []string{"10.99.0.0/16", "10.42.42.0/24"},
|
||||
},
|
||||
}},
|
||||
}}
|
||||
})
|
||||
out, err := Translate(Inputs{
|
||||
LocalPods: []Pod{pod},
|
||||
Policies: []netv1.NetworkPolicy{policy},
|
||||
}, nil)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if len(out.Rules) != 1 {
|
||||
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
|
||||
}
|
||||
r := out.Rules[0]
|
||||
if len(r.PeerCIDRs) != 1 || r.PeerCIDRs[0].String() != "10.0.0.0/8" {
|
||||
t.Fatalf("peer CIDR wrong: %v", r.PeerCIDRs)
|
||||
}
|
||||
if len(r.PeerExcept) != 2 {
|
||||
t.Fatalf("expected 2 except, got %d", len(r.PeerExcept))
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_AllowAllPeers — empty From list means "from anywhere".
|
||||
func TestTranslate_AllowAllPeers(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
policy := newPolicy("ns1", "allow-all-on-port", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
Ports: []netv1.NetworkPolicyPort{tcpPort(443)},
|
||||
}}
|
||||
})
|
||||
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
|
||||
if len(out.Rules) != 1 {
|
||||
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
|
||||
}
|
||||
r := out.Rules[0]
|
||||
if len(r.PeerCIDRs) != 0 || len(r.PeerExcept) != 0 {
|
||||
t.Fatalf("expected allow-all peers, got CIDRs=%v Except=%v", r.PeerCIDRs, r.PeerExcept)
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_AllowAllPorts — empty Ports list means "all ports".
|
||||
func TestTranslate_AllowAllPorts(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
policy := newPolicy("ns1", "allow-from-all", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
From: []netv1.NetworkPolicyPeer{{
|
||||
PodSelector: emptySelector(),
|
||||
}},
|
||||
}}
|
||||
})
|
||||
peer := PeerPod{
|
||||
Namespace: "ns1", Name: "x",
|
||||
IPs: []net.IP{mustIP("2001:db8::aa")},
|
||||
}
|
||||
out, _ := Translate(Inputs{
|
||||
LocalPods: []Pod{pod}, PeerPods: []PeerPod{peer},
|
||||
Policies: []netv1.NetworkPolicy{policy},
|
||||
}, nil)
|
||||
if len(out.Rules) != 1 {
|
||||
t.Fatalf("expected 1 rule, got %d", len(out.Rules))
|
||||
}
|
||||
r := out.Rules[0]
|
||||
if len(r.Ports) != 1 || r.Ports[0] != (PortMatch{}) {
|
||||
t.Fatalf("expected single any-port match, got %+v", r.Ports)
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_PortRange — endPort field.
|
||||
func TestTranslate_PortRange(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
policy := newPolicy("ns1", "range", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
proto := corev1.ProtocolTCP
|
||||
port := intstr.FromInt32(8000)
|
||||
end := int32(8999)
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
Ports: []netv1.NetworkPolicyPort{{Protocol: &proto, Port: &port, EndPort: &end}},
|
||||
}}
|
||||
})
|
||||
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
|
||||
if len(out.Rules) != 1 || out.Rules[0].Ports[0].Port != 8000 || out.Rules[0].Ports[0].EndPort != 8999 {
|
||||
t.Fatalf("range not preserved: %+v", out.Rules)
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_NamedPortRejected — named ports aren't supported yet;
|
||||
// translator must skip the rule and warn.
|
||||
func TestTranslate_NamedPortRejected(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
proto := corev1.ProtocolTCP
|
||||
named := intstr.FromString("http")
|
||||
policy := newPolicy("ns1", "named", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
Ports: []netv1.NetworkPolicyPort{{Protocol: &proto, Port: &named}},
|
||||
}}
|
||||
})
|
||||
var warns []string
|
||||
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, func(s string) {
|
||||
warns = append(warns, s)
|
||||
})
|
||||
if len(out.Rules) != 0 {
|
||||
t.Fatalf("expected named-port rule to be skipped")
|
||||
}
|
||||
if len(warns) == 0 {
|
||||
t.Fatalf("expected a warning about named ports")
|
||||
}
|
||||
// The pod should still be isolated since the policy selected it.
|
||||
in, _ := isolationFor(out, "ns1/web")
|
||||
if !in {
|
||||
t.Fatalf("pod should be isolated even when its rule is dropped")
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_PolicyOnlyAppliesToOwnNamespace — a policy in nsA does
|
||||
// NOT select pods in nsB even if their labels match.
|
||||
func TestTranslate_PolicyScopedToNamespace(t *testing.T) {
|
||||
a := Pod{Namespace: "nsA", Name: "p", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"}, IPs: []net.IP{mustIP("2001:db8::1")}}
|
||||
b := Pod{Namespace: "nsB", Name: "p", HostIface: "f2",
|
||||
Labels: map[string]string{"app": "web"}, IPs: []net.IP{mustIP("2001:db8::2")}}
|
||||
policy := newPolicy("nsA", "deny", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *selectorMatching(map[string]string{"app": "web"})
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
})
|
||||
out, _ := Translate(Inputs{LocalPods: []Pod{a, b}, Policies: []netv1.NetworkPolicy{policy}}, nil)
|
||||
inA, _ := isolationFor(out, "nsA/p")
|
||||
inB, _ := isolationFor(out, "nsB/p")
|
||||
if !inA {
|
||||
t.Fatalf("nsA/p should be isolated")
|
||||
}
|
||||
if inB {
|
||||
t.Fatalf("nsB/p must NOT be isolated by a policy in nsA")
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_PodWithoutAllocationSkipped — pod with no IPs is silently
|
||||
// skipped (its rule could not match any traffic anyway).
|
||||
func TestTranslate_PodWithoutAllocationSkipped(t *testing.T) {
|
||||
pod := Pod{Namespace: "ns1", Name: "p", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"}}
|
||||
policy := newPolicy("ns1", "deny", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
})
|
||||
out, _ := Translate(Inputs{LocalPods: []Pod{pod}, Policies: []netv1.NetworkPolicy{policy}}, nil)
|
||||
in, _ := isolationFor(out, "ns1/p")
|
||||
if in {
|
||||
t.Fatalf("pod without IP should not appear in output")
|
||||
}
|
||||
}
|
||||
|
||||
// TestTranslate_Determinism — translating the same Inputs twice produces
|
||||
// equal outputs (Rules in equal order, Isolated equal).
|
||||
func TestTranslate_Determinism(t *testing.T) {
|
||||
pod := Pod{
|
||||
Namespace: "ns1", Name: "web", HostIface: "f1",
|
||||
Labels: map[string]string{"app": "web"},
|
||||
IPs: []net.IP{mustIP("2001:db8::1")},
|
||||
}
|
||||
peers := []PeerPod{
|
||||
{Namespace: "ns1", Name: "z", Labels: map[string]string{"app": "client"}, IPs: []net.IP{mustIP("2001:db8::2")}},
|
||||
{Namespace: "ns1", Name: "a", Labels: map[string]string{"app": "client"}, IPs: []net.IP{mustIP("2001:db8::3")}},
|
||||
}
|
||||
policies := []netv1.NetworkPolicy{
|
||||
newPolicy("ns1", "z-second", func(p *netv1.NetworkPolicy) {
|
||||
p.Spec.PodSelector = *emptySelector()
|
||||
p.Spec.PolicyTypes = []netv1.PolicyType{netv1.PolicyTypeIngress}
|
||||
p.Spec.Ingress = []netv1.NetworkPolicyIngressRule{{
|
||||
From: []netv1.NetworkPolicyPeer{{
|
||||
PodSelector: selectorMatching(map[string]string{"app": "client"}),
|
||||
}},
|
||||
}}
|
||||
}),
|
||||
}
|
||||
in := Inputs{LocalPods: []Pod{pod}, PeerPods: peers, Policies: policies}
|
||||
a, _ := Translate(in, nil)
|
||||
b, _ := Translate(in, nil)
|
||||
if len(a.Rules) != len(b.Rules) {
|
||||
t.Fatalf("rule count differs: %d vs %d", len(a.Rules), len(b.Rules))
|
||||
}
|
||||
for i := range a.Rules {
|
||||
if a.Rules[i].PodKey != b.Rules[i].PodKey || len(a.Rules[i].PeerCIDRs) != len(b.Rules[i].PeerCIDRs) {
|
||||
t.Fatalf("rule[%d] differs", i)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,147 @@
|
||||
package netpol
|
||||
|
||||
import "net"
|
||||
|
||||
// Direction is the NetworkPolicy direction, named from the *pod's*
|
||||
// perspective (matching the NetworkPolicy API). "Ingress" is traffic
|
||||
// arriving at the pod; "Egress" is traffic the pod initiates.
|
||||
//
|
||||
// Note that on the host this maps the opposite way at the veth: an
|
||||
// Ingress rule matches packets whose oifname is the pod's host-side veth
|
||||
// (the kernel is forwarding into the pod), and an Egress rule matches
|
||||
// packets whose iifname is the pod's host-side veth (the kernel just
|
||||
// received from the pod).
|
||||
type Direction int
|
||||
|
||||
const (
|
||||
DirIngress Direction = iota
|
||||
DirEgress
|
||||
)
|
||||
|
||||
// String returns the lower-case wire form ("ingress" / "egress").
|
||||
func (d Direction) String() string {
|
||||
if d == DirEgress {
|
||||
return "egress"
|
||||
}
|
||||
return "ingress"
|
||||
}
|
||||
|
||||
// Pod is the local-pod information the translator needs. The reconciler
|
||||
// populates this from its store of CNI allocations — every pod with a
|
||||
// committed allocation on this node appears here.
|
||||
type Pod struct {
|
||||
// Namespace + Name uniquely identify the pod.
|
||||
Namespace string
|
||||
Name string
|
||||
// Labels are the pod labels. NetworkPolicy.Spec.PodSelector matches
|
||||
// against these.
|
||||
Labels map[string]string
|
||||
// HostIface is the host-side veth name (e.g. "flock1a2b3c4d"). All
|
||||
// rules guarding this pod hook off iifname/oifname == HostIface.
|
||||
HostIface string
|
||||
// IPs are the pod's eth0 addresses (IPv6 and/or IPv4). Empty means
|
||||
// the agent has no allocation for this pod yet — translator should
|
||||
// skip such pods.
|
||||
IPs []net.IP
|
||||
}
|
||||
|
||||
// PeerPod is a (potentially remote) pod whose IPs may be referenced as a
|
||||
// NetworkPolicy peer. The translator resolves podSelector +
|
||||
// namespaceSelector peers to their IPs by walking the cluster-wide
|
||||
// peer-pod set.
|
||||
type PeerPod struct {
|
||||
Namespace string
|
||||
Name string
|
||||
Labels map[string]string
|
||||
IPs []net.IP
|
||||
}
|
||||
|
||||
// Namespace carries just enough metadata for namespaceSelector matching.
|
||||
type Namespace struct {
|
||||
Name string
|
||||
Labels map[string]string
|
||||
}
|
||||
|
||||
// LocalPod is the renderer-visible subset of a local pod — just enough
|
||||
// to anchor a base-chain jump. Carried in Output so the renderer can
|
||||
// emit chains for default-deny pods that have no explicit allow rules.
|
||||
type LocalPod struct {
|
||||
PodKey string
|
||||
HostIface string
|
||||
IPs []net.IP
|
||||
}
|
||||
|
||||
// PortMatch is one allowed (protocol, port) tuple. EndPort is inclusive;
|
||||
// when zero the rule matches the single Port.
|
||||
type PortMatch struct {
|
||||
Protocol string // "tcp", "udp", "sctp"; empty means "any of the three"
|
||||
Port int // 1..65535. Zero means "any port".
|
||||
EndPort int // 0 if not a range; otherwise inclusive range end.
|
||||
}
|
||||
|
||||
// Rule is the canonical intermediate representation between the translator
|
||||
// and the renderer. One Rule is one accept-line in the rendered nft
|
||||
// script. A pod's chain is the ordered concatenation of every Rule whose
|
||||
// PodKey matches; any packet that falls off the end is denied by the
|
||||
// trailing default-deny verdict (the chain has policy drop).
|
||||
//
|
||||
// PeerCIDRs are OR'd together, then PeerExcept is subtracted. Empty
|
||||
// PeerCIDRs + empty PeerExcept means "any source/destination".
|
||||
type Rule struct {
|
||||
// PodKey is namespace/name of the pod this rule guards. Used by the
|
||||
// renderer to slot the rule into the correct chain.
|
||||
PodKey string
|
||||
|
||||
// HostIface is the pod's host-side veth name; the renderer uses it
|
||||
// to anchor the base-chain jump.
|
||||
HostIface string
|
||||
|
||||
// PodIPs are the pod's eth0 addresses. The base chain matches on
|
||||
// (oifname == HostIface AND daddr ∈ PodIPs) for ingress, and
|
||||
// (iifname == HostIface AND saddr ∈ PodIPs) for egress, so packets
|
||||
// that aren't destined to / from the actual pod address don't get
|
||||
// counted as policy-protected.
|
||||
PodIPs []net.IP
|
||||
|
||||
// Direction is Ingress or Egress, named from the pod's perspective.
|
||||
Direction Direction
|
||||
|
||||
// Action is "accept" for explicit allows; default-deny is implicit
|
||||
// in the chain's policy drop and is not represented as a Rule.
|
||||
// (Reserved for future deny-list semantics like AdminNetworkPolicy.)
|
||||
Action Action
|
||||
|
||||
// PeerCIDRs are the addresses of allowed peers. OR'd together.
|
||||
// Empty means "any peer".
|
||||
PeerCIDRs []*net.IPNet
|
||||
|
||||
// PeerExcept narrows PeerCIDRs by subtracting these ranges. Only
|
||||
// meaningful with non-empty PeerCIDRs (it comes from
|
||||
// ipBlock.except, which requires ipBlock.cidr).
|
||||
PeerExcept []*net.IPNet
|
||||
|
||||
// Ports is the set of allowed (protocol, port) tuples. Empty means
|
||||
// "any port / any protocol".
|
||||
Ports []PortMatch
|
||||
}
|
||||
|
||||
// Action is the verdict emitted by a Rule.
|
||||
type Action int
|
||||
|
||||
const (
|
||||
// ActionAccept lets the packet through. The default-deny is implicit
|
||||
// in the chain policy.
|
||||
ActionAccept Action = iota
|
||||
// ActionDrop is reserved for future use (AdminNetworkPolicy /
|
||||
// BaselineAdminNetworkPolicy explicit denies). Not produced by the
|
||||
// v1 translator.
|
||||
ActionDrop
|
||||
)
|
||||
|
||||
// String returns the nft-syntax verdict.
|
||||
func (a Action) String() string {
|
||||
if a == ActionDrop {
|
||||
return "drop"
|
||||
}
|
||||
return "accept"
|
||||
}
|
||||
@@ -0,0 +1,56 @@
|
||||
package agent
|
||||
|
||||
import (
|
||||
"net"
|
||||
|
||||
"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
|
||||
)
|
||||
|
||||
// collectLocalPods bridges the agent's allocation store + pod informer
|
||||
// cache into the netpol-package input shape. It returns one Pod per
|
||||
// committed allocation that has a matching pod in the informer cache;
|
||||
// allocations whose pod was just deleted (DEL race) are skipped.
|
||||
//
|
||||
// Called on every netpol reconcile pass, so it must be cheap. The work
|
||||
// here is O(allocations) and reads from in-memory maps only.
|
||||
func collectLocalPods(store *Store, pods *PodCache) []netpol.Pod {
|
||||
allocs := store.Snapshot()
|
||||
out := make([]netpol.Pod, 0, len(allocs))
|
||||
for _, a := range allocs {
|
||||
if a.State != StateCommitted {
|
||||
continue
|
||||
}
|
||||
pod, ok := pods.Get(a.Namespace, a.PodName)
|
||||
if !ok {
|
||||
// Pod evicted but DEL hasn't fired yet; nothing to enforce.
|
||||
continue
|
||||
}
|
||||
ips := allocationIPs(a)
|
||||
if len(ips) == 0 {
|
||||
continue
|
||||
}
|
||||
out = append(out, netpol.Pod{
|
||||
Namespace: a.Namespace,
|
||||
Name: a.PodName,
|
||||
Labels: pod.Labels,
|
||||
HostIface: HostIfaceName(a.ContainerID),
|
||||
IPs: ips,
|
||||
})
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func allocationIPs(a Allocation) []net.IP {
|
||||
var out []net.IP
|
||||
if a.IP6 != "" {
|
||||
if ip := net.ParseIP(a.IP6); ip != nil {
|
||||
out = append(out, ip)
|
||||
}
|
||||
}
|
||||
if a.IP4 != "" {
|
||||
if ip := net.ParseIP(a.IP4); ip != nil {
|
||||
out = append(out, ip)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
@@ -7,6 +7,8 @@ import (
|
||||
"fmt"
|
||||
"net"
|
||||
"time"
|
||||
|
||||
"code.fritzlab.net/fritzlab/flock/pkg/agent/netpol"
|
||||
)
|
||||
|
||||
// configureRuntime wires Pod informer, IPAM, netlink, and BIRD on a real
|
||||
@@ -103,6 +105,17 @@ func (s *Server) configureRuntime(ctx context.Context) error {
|
||||
}
|
||||
}()
|
||||
|
||||
// NetworkPolicy enforcement.
|
||||
world := netpol.NewWorld(s.Logger)
|
||||
if err := world.Start(ctx, s.restCfg); err != nil {
|
||||
return fmt.Errorf("netpol informers: %w", err)
|
||||
}
|
||||
npApplier := &netpol.Applier{}
|
||||
npReconciler := netpol.NewReconciler(world, func() []netpol.Pod {
|
||||
return collectLocalPods(s.Store, pods)
|
||||
}, npApplier, s.Logger)
|
||||
go npReconciler.Run(ctx)
|
||||
|
||||
handler := &PodHandler{
|
||||
Node: s.Node,
|
||||
Store: s.Store,
|
||||
@@ -111,7 +124,12 @@ func (s *Server) configureRuntime(ctx context.Context) error {
|
||||
NodeConfig: s.NodeConfig,
|
||||
SetupFunc: Setup,
|
||||
TeardownFunc: Teardown,
|
||||
AfterCommit: anycast.Trigger,
|
||||
AfterCommit: func() {
|
||||
anycast.Trigger()
|
||||
// Re-evaluate policy on every CNI ADD/DEL so a brand-new
|
||||
// pod's chain lands before its first packet egresses.
|
||||
npReconciler.Trigger()
|
||||
},
|
||||
}
|
||||
s.RPC.SetHandlers(handler.Add, handler.Del, handler.Check)
|
||||
s.Logger.Info("runtime ready",
|
||||
|
||||
+3
-3
@@ -1,6 +1,6 @@
|
||||
// Package agent owns the in-process flock-agent runtime: IPAM, netns, state,
|
||||
// anycast, and NetworkPolicy. This file implements the durable per-node
|
||||
// allocation file at /var/lib/flock/allocations.json.
|
||||
// This file implements the durable per-node allocation file at
|
||||
// /var/lib/flock/allocations.json. The package-level doc lives in doc.go.
|
||||
|
||||
package agent
|
||||
|
||||
import (
|
||||
|
||||
@@ -1,3 +1,8 @@
|
||||
// Package v1alpha1 contains the operator-facing API types for flock.
|
||||
//
|
||||
// Stability: alpha. The shape of these types may change in incompatible ways
|
||||
// between minor releases. CRDs are versioned and the agent reads only its
|
||||
// pinned version.
|
||||
package v1alpha1
|
||||
|
||||
import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
@@ -6,27 +11,78 @@ import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
|
||||
//
|
||||
// The agent reads this on startup and via informer for live updates. There is
|
||||
// no controller and no auto-allocation — purely declarative input.
|
||||
//
|
||||
// A NodeConfig's name MUST equal the Kubernetes node name it configures
|
||||
// (NodeConfigs are cluster-scoped). The agent ignores all NodeConfigs whose
|
||||
// name does not match its own node.
|
||||
type NodeConfigSpec struct {
|
||||
// CIDR6 is the set of IPv6 CIDRs this node owns and advertises as BGP
|
||||
// aggregates. Pod IPv6 addresses are allocated from these.
|
||||
// aggregates. Pod IPv6 addresses are allocated from these. May be empty
|
||||
// only if Defaults disables IPv6 for every pod on this node.
|
||||
CIDR6 []string `json:"cidr6,omitempty"`
|
||||
|
||||
// CIDR4 is the set of IPv4 CIDRs this node owns and advertises as BGP
|
||||
// aggregates. Pod IPv4 addresses are allocated from these.
|
||||
// aggregates. Pod IPv4 addresses are allocated from these. May be empty
|
||||
// when no pod on this node ever opts into IPv4.
|
||||
CIDR4 []string `json:"cidr4,omitempty"`
|
||||
|
||||
// BGP configures the BGP sessions this node establishes upstream.
|
||||
BGP BGPSpec `json:"bgp"`
|
||||
|
||||
// Defaults sets the per-node baseline for which address families a pod
|
||||
// receives when its own annotations don't say. Pod-level
|
||||
// `flock.fritzlab.net/ipv6` and `flock.fritzlab.net/ipv4` annotations
|
||||
// always override these defaults.
|
||||
//
|
||||
// When a field is unset (nil), the agent falls back to its built-in
|
||||
// baseline of IPv6=true, IPv4=false. When the whole Defaults block is
|
||||
// nil, both built-in defaults apply.
|
||||
//
|
||||
// Typical uses:
|
||||
// - dual-stack node: Defaults: { ipv6: true, ipv4: true }
|
||||
// - IPv4-only node: Defaults: { ipv6: false, ipv4: true }
|
||||
// - default (omit Defaults entirely): IPv6-only.
|
||||
//
|
||||
// Validation: at least one of IPv6 or IPv4 must end up true after merging
|
||||
// (annotations + defaults + built-in baseline). The agent rejects pods
|
||||
// that resolve to neither.
|
||||
Defaults *FamilyDefaults `json:"defaults,omitempty"`
|
||||
}
|
||||
|
||||
// FamilyDefaults is the per-node default for which address families a pod
|
||||
// receives when its annotations don't specify. Each field is a pointer so
|
||||
// "unset" is distinguishable from explicit "false".
|
||||
type FamilyDefaults struct {
|
||||
// IPv6 is the default value for the `flock.fritzlab.net/ipv6` annotation.
|
||||
// nil → fall back to the built-in baseline (true).
|
||||
IPv6 *bool `json:"ipv6,omitempty"`
|
||||
|
||||
// IPv4 is the default value for the `flock.fritzlab.net/ipv4` annotation.
|
||||
// nil → fall back to the built-in baseline (false).
|
||||
IPv4 *bool `json:"ipv4,omitempty"`
|
||||
}
|
||||
|
||||
// BGPSpec describes this node's BGP speaker configuration. Each upstream peer
|
||||
// becomes one BGP session in the rendered bird.conf.
|
||||
type BGPSpec struct {
|
||||
ASN uint32 `json:"asn"`
|
||||
// ASN is this node's local autonomous system number. flock uses private
|
||||
// ASNs in the 64512-65534 range by convention but accepts any value.
|
||||
ASN uint32 `json:"asn"`
|
||||
|
||||
// Peers is the set of upstream BGP neighbors. At least one is required
|
||||
// for BGP advertisement to function. Multiple peers of the same family
|
||||
// are allowed (multi-homing).
|
||||
Peers []BGPPeer `json:"peers"`
|
||||
}
|
||||
|
||||
// BGPPeer is a single upstream BGP neighbor.
|
||||
type BGPPeer struct {
|
||||
// Address is the peer's IP. May be IPv4 or IPv6. The agent picks an
|
||||
// appropriate local source address on the same subnet.
|
||||
Address string `json:"address"`
|
||||
ASN uint32 `json:"asn"`
|
||||
|
||||
// ASN is the peer's remote ASN.
|
||||
ASN uint32 `json:"asn"`
|
||||
}
|
||||
|
||||
// NodeConfig is the Schema for the nodeconfigs API. NodeConfigs are cluster-
|
||||
|
||||
@@ -0,0 +1,103 @@
|
||||
package embed
|
||||
|
||||
import (
|
||||
"net"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// FuzzEmbed verifies that Embed never panics and that any successful return
|
||||
// keeps the output address inside the requested network.
|
||||
func FuzzEmbed(f *testing.F) {
|
||||
type seed struct {
|
||||
prefix string
|
||||
fields string // comma-separated, mapped below to []Field
|
||||
ns, pod string
|
||||
image string
|
||||
fallback string
|
||||
nNibble byte
|
||||
}
|
||||
for _, s := range []seed{
|
||||
{"2602:817:3000:f001::/64", "namespace,pod,image", "mail", "stalwart-0", "", "ctr", 0xe},
|
||||
{"2001:db8::/64", "namespace", "ns", "p", "", "", 0},
|
||||
{"2001:db8::/96", "pod", "", "podname", "", "ctr", 0xf},
|
||||
{"2001:db8::/48", "namespace,pod", "ns", "p", "", "ctr", 0x1},
|
||||
{"2001:db8::/120", "namespace", "n", "p", "", "ctr", 0x0}, // 8 host nibbles
|
||||
{"2001:db8::/124", "namespace", "n", "p", "", "ctr", 0x0}, // 4 host nibbles
|
||||
{"2001:db8::/127", "namespace", "n", "p", "", "ctr", 0x0}, // not nibble-aligned
|
||||
{"2001:db8::/63", "namespace", "n", "p", "", "ctr", 0x0}, // not nibble-aligned
|
||||
{"2001:db8::/64", "namespace,pod,image", "", "", "sha256:abcdef0123456789aabbccddeeff00112233445566778899aabbccddeeff0011", "", 0xa},
|
||||
{"2001:db8::/64", "namespace,pod,image", "", "", "", "ctr", 0xa},
|
||||
{"2001:db8::/64", "namespace", "🦆", "🐧", "", "", 0},
|
||||
{"2001:db8::/64", "namespace", "ns\x00\x00", "p", "", "", 0},
|
||||
} {
|
||||
f.Add(s.prefix, s.fields, s.ns, s.pod, s.image, s.fallback, s.nNibble)
|
||||
}
|
||||
|
||||
f.Fuzz(func(t *testing.T, prefix, fieldsStr, ns, pod, image, fallback string, nNibble byte) {
|
||||
_, network, err := net.ParseCIDR(prefix)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
fields, ok := decodeFields(fieldsStr)
|
||||
if !ok {
|
||||
return
|
||||
}
|
||||
got, err := Embed(network, fields, Values{
|
||||
Namespace: ns,
|
||||
Pod: pod,
|
||||
Image: image,
|
||||
ImageFallback: fallback,
|
||||
}, nNibble)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
if !network.Contains(got) {
|
||||
t.Fatalf("Embed(%s, %v) = %s, outside network", prefix, fields, got)
|
||||
}
|
||||
// Property: low nibble of last byte equals nNibble & 0x0F.
|
||||
if want := nNibble & 0x0F; got[len(got)-1]&0x0F != want {
|
||||
t.Fatalf("low nibble = %x, want %x", got[len(got)-1]&0x0F, want)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func decodeFields(s string) ([]Field, bool) {
|
||||
if s == "" {
|
||||
return nil, false
|
||||
}
|
||||
var out []Field
|
||||
cur := []byte{}
|
||||
flush := func() bool {
|
||||
if len(cur) == 0 {
|
||||
return true
|
||||
}
|
||||
switch string(cur) {
|
||||
case string(FieldNamespace):
|
||||
out = append(out, FieldNamespace)
|
||||
case string(FieldPod):
|
||||
out = append(out, FieldPod)
|
||||
case string(FieldImage):
|
||||
out = append(out, FieldImage)
|
||||
default:
|
||||
return false
|
||||
}
|
||||
cur = cur[:0]
|
||||
return true
|
||||
}
|
||||
for i := 0; i < len(s); i++ {
|
||||
if s[i] == ',' {
|
||||
if !flush() {
|
||||
return nil, false
|
||||
}
|
||||
continue
|
||||
}
|
||||
cur = append(cur, s[i])
|
||||
}
|
||||
if !flush() {
|
||||
return nil, false
|
||||
}
|
||||
if len(out) == 0 {
|
||||
return nil, false
|
||||
}
|
||||
return out, true
|
||||
}
|
||||
+129
-6
@@ -9,6 +9,7 @@ import (
|
||||
"fmt"
|
||||
"net"
|
||||
"sort"
|
||||
"strings"
|
||||
"text/template"
|
||||
)
|
||||
|
||||
@@ -118,28 +119,150 @@ protocol bgp upstream4_{{$i}} {
|
||||
{{end}}{{end}}`
|
||||
|
||||
// Render produces the bird.conf text.
|
||||
//
|
||||
// The output is deterministic: the same NodeBGP input always produces the
|
||||
// same string. CIDR lists, anycast lists, and peer lists are sorted before
|
||||
// templating so that the only way the rendered config changes is when
|
||||
// semantically meaningful inputs change. This stability matters because
|
||||
// BirdManager compares Render output against the last-written config to
|
||||
// avoid superfluous birdc reloads.
|
||||
//
|
||||
// Render validates every operator-supplied value that flows into the
|
||||
// templated output (peer addresses, CIDRs, anycast IPs, source addresses)
|
||||
// so a malformed NodeConfig or annotation cannot produce a malformed
|
||||
// bird.conf — even one that BIRD would later reject.
|
||||
func Render(in NodeBGP) (string, error) {
|
||||
if in.RouterID == "" {
|
||||
return "", fmt.Errorf("RouterID is required")
|
||||
return "", fmt.Errorf("bird render: RouterID is required")
|
||||
}
|
||||
if net.ParseIP(in.RouterID) == nil {
|
||||
return "", fmt.Errorf("bird render: RouterID %q is not a valid IP", in.RouterID)
|
||||
}
|
||||
if in.LocalASN == 0 {
|
||||
return "", fmt.Errorf("LocalASN is required")
|
||||
return "", fmt.Errorf("bird render: LocalASN is required")
|
||||
}
|
||||
// Stable order — important so config changes only when something real
|
||||
// changes (avoids needless birdc reloads).
|
||||
if err := validateLocalSource(in.LocalV6, "v6"); err != nil {
|
||||
return "", err
|
||||
}
|
||||
if err := validateLocalSource(in.LocalV4, "v4"); err != nil {
|
||||
return "", err
|
||||
}
|
||||
for i, p := range in.Peers {
|
||||
if err := validatePeer(p); err != nil {
|
||||
return "", fmt.Errorf("bird render: peer[%d]: %w", i, err)
|
||||
}
|
||||
}
|
||||
if err := validateCIDRs(in.CIDR6, "v6"); err != nil {
|
||||
return "", fmt.Errorf("bird render: cidr6: %w", err)
|
||||
}
|
||||
if err := validateCIDRs(in.CIDR4, "v4"); err != nil {
|
||||
return "", fmt.Errorf("bird render: cidr4: %w", err)
|
||||
}
|
||||
if err := validateAnycastIPs(in.Anycast6, "v6"); err != nil {
|
||||
return "", fmt.Errorf("bird render: anycast6: %w", err)
|
||||
}
|
||||
if err := validateAnycastIPs(in.Anycast4, "v4"); err != nil {
|
||||
return "", fmt.Errorf("bird render: anycast4: %w", err)
|
||||
}
|
||||
|
||||
in = normalize(in)
|
||||
|
||||
t, err := template.New("bird").Parse(tpl)
|
||||
if err != nil {
|
||||
return "", err
|
||||
return "", fmt.Errorf("bird template parse: %w", err)
|
||||
}
|
||||
var buf bytes.Buffer
|
||||
if err := t.Execute(&buf, in); err != nil {
|
||||
return "", err
|
||||
return "", fmt.Errorf("bird template execute: %w", err)
|
||||
}
|
||||
return buf.String(), nil
|
||||
}
|
||||
|
||||
// validatePeer checks that a peer entry has a parseable IP whose family
|
||||
// matches its declared Family field, and a non-zero ASN.
|
||||
func validatePeer(p Peer) error {
|
||||
if p.ASN == 0 {
|
||||
return fmt.Errorf("ASN must be non-zero")
|
||||
}
|
||||
ip := net.ParseIP(p.Address)
|
||||
if ip == nil {
|
||||
return fmt.Errorf("address %q is not a valid IP", p.Address)
|
||||
}
|
||||
isV4 := ip.To4() != nil
|
||||
switch p.Family {
|
||||
case "v6":
|
||||
if isV4 {
|
||||
return fmt.Errorf("address %q is IPv4 but Family is v6", p.Address)
|
||||
}
|
||||
case "v4":
|
||||
if !isV4 {
|
||||
return fmt.Errorf("address %q is IPv6 but Family is v4", p.Address)
|
||||
}
|
||||
default:
|
||||
return fmt.Errorf("Family %q must be v6 or v4", p.Family)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// validateCIDRs parses each entry as a CIDR and rejects family mismatches.
|
||||
// fam must be "v6" or "v4".
|
||||
func validateCIDRs(cidrs []string, fam string) error {
|
||||
for _, c := range cidrs {
|
||||
_, n, err := net.ParseCIDR(c)
|
||||
if err != nil {
|
||||
return fmt.Errorf("invalid CIDR %q: %w", c, err)
|
||||
}
|
||||
isV4 := n.IP.To4() != nil
|
||||
if fam == "v6" && isV4 {
|
||||
return fmt.Errorf("CIDR %q is IPv4, expected IPv6", c)
|
||||
}
|
||||
if fam == "v4" && !isV4 {
|
||||
return fmt.Errorf("CIDR %q is IPv6, expected IPv4", c)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// validateAnycastIPs parses each entry as a literal IP (no prefix) and rejects
|
||||
// family mismatches.
|
||||
func validateAnycastIPs(ips []string, fam string) error {
|
||||
for _, s := range ips {
|
||||
ip := net.ParseIP(s)
|
||||
if ip == nil {
|
||||
return fmt.Errorf("invalid IP %q", s)
|
||||
}
|
||||
isV4 := ip.To4() != nil
|
||||
if fam == "v6" && isV4 {
|
||||
return fmt.Errorf("IP %q is IPv4, expected IPv6", s)
|
||||
}
|
||||
if fam == "v4" && !isV4 {
|
||||
return fmt.Errorf("IP %q is IPv6, expected IPv4", s)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// validateLocalSource validates an optional LocalV6/LocalV4 source address.
|
||||
// Empty is allowed (BIRD picks its own); non-empty must be a parseable IP of
|
||||
// the matching family.
|
||||
func validateLocalSource(s, fam string) error {
|
||||
if s == "" {
|
||||
return nil
|
||||
}
|
||||
ip := net.ParseIP(s)
|
||||
if ip == nil {
|
||||
return fmt.Errorf("bird render: Local%s %q is not a valid IP", strings.ToUpper(fam), s)
|
||||
}
|
||||
isV4 := ip.To4() != nil
|
||||
if fam == "v6" && isV4 {
|
||||
return fmt.Errorf("bird render: LocalV6 %q is IPv4", s)
|
||||
}
|
||||
if fam == "v4" && !isV4 {
|
||||
return fmt.Errorf("bird render: LocalV4 %q is IPv6", s)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func normalize(in NodeBGP) NodeBGP {
|
||||
cp := in
|
||||
cp.CIDR6 = sortedUnique(in.CIDR6)
|
||||
|
||||
@@ -0,0 +1,93 @@
|
||||
package bird
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// FuzzRender drives the bird template with a wide range of inputs and
|
||||
// confirms two safety properties:
|
||||
//
|
||||
// 1. Render never panics.
|
||||
// 2. On nil-error return, the output is deterministic (calling Render
|
||||
// twice with the same input yields byte-identical output) and contains
|
||||
// no unbalanced braces (a smoke test for malformed template branches).
|
||||
func FuzzRender(f *testing.F) {
|
||||
type seed struct {
|
||||
routerID string
|
||||
asn uint32
|
||||
peerAddr string
|
||||
peerASN uint32
|
||||
cidr6 string
|
||||
cidr4 string
|
||||
anycast6 string
|
||||
anycast4 string
|
||||
localV6 string
|
||||
localV4 string
|
||||
}
|
||||
seeds := []seed{
|
||||
{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1", peerASN: 65000, cidr6: "2001:db8:f001::/64"},
|
||||
{routerID: "172.25.25.101", asn: 65101, peerAddr: "172.25.25.1", peerASN: 65000, cidr4: "172.25.210.0/24"},
|
||||
{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1", peerASN: 65000, cidr6: "2001:db8:f001::/64", anycast6: "2001:db8:a::1"},
|
||||
{routerID: "10.0.0.1", asn: 65101, peerAddr: "10.0.0.2", peerASN: 65000, cidr4: "10.0.0.0/24", anycast4: "10.255.0.1"},
|
||||
{routerID: "10.0.0.1", asn: 65101}, // no peer, no cidrs
|
||||
{routerID: "", asn: 65101, peerAddr: "10.0.0.2", peerASN: 1}, // empty routerID → expect error
|
||||
{routerID: "10.0.0.1", asn: 0, peerAddr: "10.0.0.2", peerASN: 1}, // zero ASN → expect error
|
||||
// Backtick-bearing inputs to defend the template against accidental
|
||||
// closure of the raw-string literal.
|
||||
{routerID: "10.0.0.1`", asn: 65101},
|
||||
// Newlines and template-meta in user-supplied addresses
|
||||
{routerID: "10.0.0.1", asn: 65101, peerAddr: "2001:db8::1\n{{kaboom}}", peerASN: 65000, cidr6: "2001:db8:f001::/64"},
|
||||
}
|
||||
for _, s := range seeds {
|
||||
f.Add(s.routerID, s.asn, s.peerAddr, s.peerASN, s.cidr6, s.cidr4, s.anycast6, s.anycast4, s.localV6, s.localV4)
|
||||
}
|
||||
|
||||
f.Fuzz(func(t *testing.T, routerID string, asn uint32, peerAddr string, peerASN uint32, cidr6, cidr4, anycast6, anycast4, localV6, localV4 string) {
|
||||
in := NodeBGP{
|
||||
RouterID: routerID,
|
||||
LocalASN: asn,
|
||||
LocalV6: localV6,
|
||||
LocalV4: localV4,
|
||||
}
|
||||
// Add the peer in whichever family it belongs to, if any. FamilyOf
|
||||
// returns "" for non-IPs; that test exercises the "skip unknown
|
||||
// family" branch in the bird agent code path.
|
||||
if fam := FamilyOf(peerAddr); fam != "" {
|
||||
in.Peers = []Peer{{Family: fam, Address: peerAddr, ASN: peerASN}}
|
||||
}
|
||||
if cidr6 != "" {
|
||||
in.CIDR6 = []string{cidr6}
|
||||
}
|
||||
if cidr4 != "" {
|
||||
in.CIDR4 = []string{cidr4}
|
||||
}
|
||||
if anycast6 != "" {
|
||||
in.Anycast6 = []string{anycast6}
|
||||
}
|
||||
if anycast4 != "" {
|
||||
in.Anycast4 = []string{anycast4}
|
||||
}
|
||||
|
||||
out, err := Render(in)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
|
||||
// Determinism.
|
||||
out2, err := Render(in)
|
||||
if err != nil {
|
||||
t.Fatalf("Render became flaky: first ok, second %v", err)
|
||||
}
|
||||
if out != out2 {
|
||||
t.Fatalf("Render not deterministic on identical input")
|
||||
}
|
||||
|
||||
// Smoke test for balanced braces. The template uses `{` and `}`
|
||||
// as BIRD's block delimiters; if our template engine ever
|
||||
// produced an unbalanced output we'd catch it here.
|
||||
if got := strings.Count(out, "{") - strings.Count(out, "}"); got != 0 {
|
||||
t.Fatalf("unbalanced braces: %d", got)
|
||||
}
|
||||
})
|
||||
}
|
||||
@@ -0,0 +1,11 @@
|
||||
go test fuzz v1
|
||||
string("0")
|
||||
uint32(65101)
|
||||
string("0")
|
||||
uint32(1)
|
||||
string("")
|
||||
string("")
|
||||
string("")
|
||||
string("}")
|
||||
string("")
|
||||
string("")
|
||||
Reference in New Issue
Block a user