Files

94 lines
4.4 KiB
Markdown
Raw Permalink Normal View History

# dns-webhook
A Kubernetes [MutatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) that rewrites DNS configuration on every new pod before it starts.
## What it does
When a pod is created with the default `dnsPolicy: ClusterFirst`, this webhook intercepts the request and:
1. **Picks 3 random nameservers** from the pool of 4 production auth-dns pods (ns1ns4). This distributes DNS query load instead of every pod always hitting the same two servers.
2. **Sets search domains** appropriate for the pod's namespace so short service names resolve correctly.
3. **Enables `edns0`** — allows DNS responses larger than 512 bytes (needed for DNSSEC and large TXT records).
4. **Enables `rotate`** — cycles through nameservers on each query for even load distribution.
Pods that opt out (`dnsPolicy: None`, `Default`, or `ClusterFirstWithHostNet`) are passed through unchanged.
## Architecture
```
kubelet / kubectl apply
Kubernetes API server
│ (Pod CREATE request)
MutatingAdmissionWebhook ──► dns-webhook pod (this service)
│ │
│ ◄── JSON Patch ─────────┘
│ replace dnsPolicy → None
│ add dnsConfig { nameservers, searches, options }
Pod stored with rewritten DNS config
```
The webhook runs as a Deployment in `kube-system` and is registered via a `MutatingWebhookConfiguration`. cert-manager issues the TLS certificate; its cainjector populates the `caBundle` field automatically.
## Logs
Key log lines to watch for during debugging:
| Prefix | Meaning |
|--------|---------|
| `dns-webhook starting: cert=... key=...` | Server startup — confirms TLS paths |
| `MUTATE pod=<ns>/<name> uid=... nameservers=[...] op=add\|replace` | Pod was mutated — shows which nameservers were assigned |
| `SKIP pod=<ns>/<name> uid=... policy=<policy>` | Pod was not mutated — shows why (non-ClusterFirst policy) |
| `ERROR ...` | Decode/encode failures — should never appear in normal operation |
```bash
# Stream logs from all webhook replicas
kubectl --context sjc001 logs -n kube-system -l app=dns-webhook -f
# Verify a running pod received the correct DNS config
kubectl --context sjc001 exec -n <namespace> <pod> -- cat /etc/resolv.conf
```
## Deployment
Managed by ArgoCD. Manifests live in the `fritzlab/apps` repo under
`sjc001/kube-system/dns-webhook/manifests/`.
```
apps/sjc001/kube-system/dns-webhook/
├── app.yaml # ArgoCD Application
└── manifests/
├── deployment.yaml # Webhook pods (2 replicas, dnsPolicy: Default)
├── issuer.yaml # cert-manager: selfSigned → CA → leaf cert
├── service.yaml # ClusterIP Service on :443 → pod :8443
├── serviceaccount.yaml
└── webhook.yaml # MutatingWebhookConfiguration
```
The `deployment.yaml` image tag (`code.fritzlab.net/fritzlab/dns-webhook:<run_number>`) must be updated whenever a new image is built. CI in this repo produces the image; update the tag in `apps` to deploy.
## Development
```bash
# Build locally
go build ./...
# Run tests (none yet — the mutation logic is straightforward enough that
# end-to-end verification via a test pod is more useful)
go test ./...
# Build container image
docker build -t dns-webhook:local .
```
## Design notes
- **`dnsPolicy: Default` on the webhook pods themselves**: avoids a circular dependency — if cluster DNS is disrupted, the webhook pods can still start because they use the node's `/etc/resolv.conf` directly.
- **`failurePolicy: Ignore`**: if the webhook is unavailable, pods are admitted without mutation rather than being blocked. Availability of workloads takes priority over DNS load balancing.
- **`imagePullPolicy: IfNotPresent`**: if cluster DNS is down at pod start time, the image pull (which needs DNS to reach the registry) would fail. This policy uses the locally cached image instead.
- **ClusterIP service (not headless)**: webhook calls are short-lived HTTP requests — the keepalive starvation problem that affects long-lived connections doesn't apply here. A stable VIP is the conventional pattern for webhook services.
- **Static nameserver IPs**: the auth-dns pods use `cni.projectcalico.org/ipAddrs` to pin their Calico-allocated IPv6 addresses across restarts, making them safe to hardcode here.