# dns-webhook A Kubernetes [MutatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) that rewrites DNS configuration on every new pod before it starts. ## What it does When a pod is created with the default `dnsPolicy: ClusterFirst`, this webhook intercepts the request and: 1. **Picks 3 random nameservers** from the pool of 4 production auth-dns pods (ns1–ns4). This distributes DNS query load instead of every pod always hitting the same two servers. 2. **Sets search domains** appropriate for the pod's namespace so short service names resolve correctly. 3. **Enables `edns0`** — allows DNS responses larger than 512 bytes (needed for DNSSEC and large TXT records). 4. **Enables `rotate`** — cycles through nameservers on each query for even load distribution. Pods that opt out (`dnsPolicy: None`, `Default`, or `ClusterFirstWithHostNet`) are passed through unchanged. ## Architecture ``` kubelet / kubectl apply │ ▼ Kubernetes API server │ (Pod CREATE request) ▼ MutatingAdmissionWebhook ──► dns-webhook pod (this service) │ │ │ ◄── JSON Patch ─────────┘ │ replace dnsPolicy → None │ add dnsConfig { nameservers, searches, options } ▼ Pod stored with rewritten DNS config ``` The webhook runs as a Deployment in `kube-system` and is registered via a `MutatingWebhookConfiguration`. cert-manager issues the TLS certificate; its cainjector populates the `caBundle` field automatically. ## Logs Key log lines to watch for during debugging: | Prefix | Meaning | |--------|---------| | `dns-webhook starting: cert=... key=...` | Server startup — confirms TLS paths | | `MUTATE pod=/ uid=... nameservers=[...] op=add\|replace` | Pod was mutated — shows which nameservers were assigned | | `SKIP pod=/ uid=... policy=` | Pod was not mutated — shows why (non-ClusterFirst policy) | | `ERROR ...` | Decode/encode failures — should never appear in normal operation | ```bash # Stream logs from all webhook replicas kubectl --context sjc001 logs -n kube-system -l app=dns-webhook -f # Verify a running pod received the correct DNS config kubectl --context sjc001 exec -n -- cat /etc/resolv.conf ``` ## Deployment Managed by ArgoCD. Manifests live in the `fritzlab/apps` repo under `sjc001/kube-system/dns-webhook/manifests/`. ``` apps/sjc001/kube-system/dns-webhook/ ├── app.yaml # ArgoCD Application └── manifests/ ├── deployment.yaml # Webhook pods (2 replicas, dnsPolicy: Default) ├── issuer.yaml # cert-manager: selfSigned → CA → leaf cert ├── service.yaml # ClusterIP Service on :443 → pod :8443 ├── serviceaccount.yaml └── webhook.yaml # MutatingWebhookConfiguration ``` The `deployment.yaml` image tag (`code.fritzlab.net/fritzlab/dns-webhook:`) must be updated whenever a new image is built. CI in this repo produces the image; update the tag in `apps` to deploy. ## Development ```bash # Build locally go build ./... # Run tests (none yet — the mutation logic is straightforward enough that # end-to-end verification via a test pod is more useful) go test ./... # Build container image docker build -t dns-webhook:local . ``` ## Design notes - **`dnsPolicy: Default` on the webhook pods themselves**: avoids a circular dependency — if cluster DNS is disrupted, the webhook pods can still start because they use the node's `/etc/resolv.conf` directly. - **`failurePolicy: Ignore`**: if the webhook is unavailable, pods are admitted without mutation rather than being blocked. Availability of workloads takes priority over DNS load balancing. - **`imagePullPolicy: IfNotPresent`**: if cluster DNS is down at pod start time, the image pull (which needs DNS to reach the registry) would fail. This policy uses the locally cached image instead. - **ClusterIP service (not headless)**: webhook calls are short-lived HTTP requests — the keepalive starvation problem that affects long-lived connections doesn't apply here. A stable VIP is the conventional pattern for webhook services. - **Static nameserver IPs**: the auth-dns pods use `cni.projectcalico.org/ipAddrs` to pin their Calico-allocated IPv6 addresses across restarts, making them safe to hardcode here.