netpol: accept established+related at top of every pod chain
Build flock Image / build (push) Has been cancelled
Build flock Image / build (push) Has been cancelled
K8s NetworkPolicy applies to the start of new connections; reply packets for established flows (and ICMP related) must not be matched against the explicit allow set. The pod ingress chain previously had only explicit dport allows + a final drop, so any reply to a pod-initiated outbound where the reply's dport (the ephemeral source port) wasn't in the allow set got dropped. Hit in production 2026-04-26: garage's `garage-admin-restrict` NP allowed dports 3900/80/3901/3903 only. Garage uses kubernetes_discovery to find peers — outbound to kube-apiserver succeeded, replies returned to ephemeral source ports, dropped → "Layout not ready" cluster-wide. Fix: emit `ct state established,related accept` as the first rule in every pod_<hash>_(ingress|egress) chain. Regression test added. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -39,6 +39,13 @@ func TestRender_DefaultDeny(t *testing.T) {
|
||||
if !strings.Contains(got, `oifname "flock00000001" jump pod_`) {
|
||||
t.Fatalf("missing veth-only ingress jump in base chain:\n%s", got)
|
||||
}
|
||||
// Stateful accept must be present so reply traffic for pod-initiated
|
||||
// outbound (e.g. ephemeral-port replies from kube-apiserver) is not
|
||||
// dropped by the chain's final drop. Regression guard: production hit
|
||||
// this when garage's k8s-discovery → apiserver replies got dropped.
|
||||
if !strings.Contains(got, "ct state established,related accept") {
|
||||
t.Fatalf("missing ct state established,related accept:\n%s", got)
|
||||
}
|
||||
}
|
||||
|
||||
// TestRender_DualStack — dual-stack pod gets one veth-anchored jump per
|
||||
|
||||
Reference in New Issue
Block a user