94 lines
4.4 KiB
Markdown
94 lines
4.4 KiB
Markdown
|
|
# dns-webhook
|
|||
|
|
|
|||
|
|
A Kubernetes [MutatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) that rewrites DNS configuration on every new pod before it starts.
|
|||
|
|
|
|||
|
|
## What it does
|
|||
|
|
|
|||
|
|
When a pod is created with the default `dnsPolicy: ClusterFirst`, this webhook intercepts the request and:
|
|||
|
|
|
|||
|
|
1. **Picks 3 random nameservers** from the pool of 4 production auth-dns pods (ns1–ns4). This distributes DNS query load instead of every pod always hitting the same two servers.
|
|||
|
|
2. **Sets search domains** appropriate for the pod's namespace so short service names resolve correctly.
|
|||
|
|
3. **Enables `edns0`** — allows DNS responses larger than 512 bytes (needed for DNSSEC and large TXT records).
|
|||
|
|
4. **Enables `rotate`** — cycles through nameservers on each query for even load distribution.
|
|||
|
|
|
|||
|
|
Pods that opt out (`dnsPolicy: None`, `Default`, or `ClusterFirstWithHostNet`) are passed through unchanged.
|
|||
|
|
|
|||
|
|
## Architecture
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
kubelet / kubectl apply
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
Kubernetes API server
|
|||
|
|
│ (Pod CREATE request)
|
|||
|
|
▼
|
|||
|
|
MutatingAdmissionWebhook ──► dns-webhook pod (this service)
|
|||
|
|
│ │
|
|||
|
|
│ ◄── JSON Patch ─────────┘
|
|||
|
|
│ replace dnsPolicy → None
|
|||
|
|
│ add dnsConfig { nameservers, searches, options }
|
|||
|
|
▼
|
|||
|
|
Pod stored with rewritten DNS config
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The webhook runs as a Deployment in `kube-system` and is registered via a `MutatingWebhookConfiguration`. cert-manager issues the TLS certificate; its cainjector populates the `caBundle` field automatically.
|
|||
|
|
|
|||
|
|
## Logs
|
|||
|
|
|
|||
|
|
Key log lines to watch for during debugging:
|
|||
|
|
|
|||
|
|
| Prefix | Meaning |
|
|||
|
|
|--------|---------|
|
|||
|
|
| `dns-webhook starting: cert=... key=...` | Server startup — confirms TLS paths |
|
|||
|
|
| `MUTATE pod=<ns>/<name> uid=... nameservers=[...] op=add\|replace` | Pod was mutated — shows which nameservers were assigned |
|
|||
|
|
| `SKIP pod=<ns>/<name> uid=... policy=<policy>` | Pod was not mutated — shows why (non-ClusterFirst policy) |
|
|||
|
|
| `ERROR ...` | Decode/encode failures — should never appear in normal operation |
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Stream logs from all webhook replicas
|
|||
|
|
kubectl --context sjc001 logs -n kube-system -l app=dns-webhook -f
|
|||
|
|
|
|||
|
|
# Verify a running pod received the correct DNS config
|
|||
|
|
kubectl --context sjc001 exec -n <namespace> <pod> -- cat /etc/resolv.conf
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Deployment
|
|||
|
|
|
|||
|
|
Managed by ArgoCD. Manifests live in the `fritzlab/apps` repo under
|
|||
|
|
`sjc001/kube-system/dns-webhook/manifests/`.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
apps/sjc001/kube-system/dns-webhook/
|
|||
|
|
├── app.yaml # ArgoCD Application
|
|||
|
|
└── manifests/
|
|||
|
|
├── deployment.yaml # Webhook pods (2 replicas, dnsPolicy: Default)
|
|||
|
|
├── issuer.yaml # cert-manager: selfSigned → CA → leaf cert
|
|||
|
|
├── service.yaml # ClusterIP Service on :443 → pod :8443
|
|||
|
|
├── serviceaccount.yaml
|
|||
|
|
└── webhook.yaml # MutatingWebhookConfiguration
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `deployment.yaml` image tag (`code.fritzlab.net/fritzlab/dns-webhook:<run_number>`) must be updated whenever a new image is built. CI in this repo produces the image; update the tag in `apps` to deploy.
|
|||
|
|
|
|||
|
|
## Development
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Build locally
|
|||
|
|
go build ./...
|
|||
|
|
|
|||
|
|
# Run tests (none yet — the mutation logic is straightforward enough that
|
|||
|
|
# end-to-end verification via a test pod is more useful)
|
|||
|
|
go test ./...
|
|||
|
|
|
|||
|
|
# Build container image
|
|||
|
|
docker build -t dns-webhook:local .
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Design notes
|
|||
|
|
|
|||
|
|
- **`dnsPolicy: Default` on the webhook pods themselves**: avoids a circular dependency — if cluster DNS is disrupted, the webhook pods can still start because they use the node's `/etc/resolv.conf` directly.
|
|||
|
|
- **`failurePolicy: Ignore`**: if the webhook is unavailable, pods are admitted without mutation rather than being blocked. Availability of workloads takes priority over DNS load balancing.
|
|||
|
|
- **`imagePullPolicy: IfNotPresent`**: if cluster DNS is down at pod start time, the image pull (which needs DNS to reach the registry) would fail. This policy uses the locally cached image instead.
|
|||
|
|
- **ClusterIP service (not headless)**: webhook calls are short-lived HTTP requests — the keepalive starvation problem that affects long-lived connections doesn't apply here. A stable VIP is the conventional pattern for webhook services.
|
|||
|
|
- **Static nameserver IPs**: the auth-dns pods use `cni.projectcalico.org/ipAddrs` to pin their Calico-allocated IPv6 addresses across restarts, making them safe to hardcode here.
|