anycast: drop pods from nexthop set on DeletionTimestamp
Build flock Image / build (push) Has been cancelled
Build flock Image / build (push) Has been cancelled
Previously the AnycastReconciler kept a pod in the nexthop set as long as its PodReady condition was True. During a rolling restart that produces a window after kubelet has accepted SIGTERM (DeletionTimestamp set, pod still Ready until probes observe shutdown) where BGP still advertises a path through the dying pod's veth — in-flight requests get RST'd when the container actually exits. Fix: introduce podAnycastEligible(pod) = !DeletionTimestamp && Ready, swap it in at the AnycastReconciler's isReady callback, and fire the ready-change callback when DeletionTimestamp transitions (the informer UpdateFunc previously only fired on Ready transitions). Result: as soon as the apiserver marks a pod for deletion, the reconciler withdraws the local nexthop and BIRD reannounces the route without it. Sibling replicas absorb traffic before the pod's terminationGracePeriod elapses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+15
-2
@@ -28,6 +28,16 @@ func podReady(pod *corev1.Pod) bool {
|
||||
return false
|
||||
}
|
||||
|
||||
// podAnycastEligible reports whether a pod should contribute its IP as a
|
||||
// nexthop for its anycast IPs. A pod is eligible when it is Ready AND not
|
||||
// being deleted. Once the apiserver sets DeletionTimestamp, kubelet has
|
||||
// started teardown — kube-proxy will keep routing for terminationGracePeriod
|
||||
// but the pod is on the way out; we should withdraw the nexthop immediately
|
||||
// so BGP shifts traffic to a sibling before the pod actually exits.
|
||||
func podAnycastEligible(pod *corev1.Pod) bool {
|
||||
return pod.DeletionTimestamp == nil && podReady(pod)
|
||||
}
|
||||
|
||||
// PodCache exposes a Get(ns, name) lookup against a node-scoped Pod
|
||||
// informer. ADD/DEL handlers consult it to read annotations + labels for
|
||||
// IPAM and (later) NetworkPolicy. Callers can subscribe to Ready
|
||||
@@ -58,7 +68,7 @@ func StartPodInformer(ctx context.Context, cfg *rest.Config, node string, logger
|
||||
|
||||
_, _ = inf.AddEventHandler(cache.ResourceEventHandlerFuncs{
|
||||
AddFunc: func(obj interface{}) {
|
||||
if pod, ok := obj.(*corev1.Pod); ok && podReady(pod) {
|
||||
if pod, ok := obj.(*corev1.Pod); ok && podAnycastEligible(pod) {
|
||||
pc.fireReady()
|
||||
}
|
||||
},
|
||||
@@ -68,7 +78,10 @@ func StartPodInformer(ctx context.Context, cfg *rest.Config, node string, logger
|
||||
if oldP == nil || newP == nil {
|
||||
return
|
||||
}
|
||||
if podReady(oldP) != podReady(newP) {
|
||||
// Fire on Ready transition OR DeletionTimestamp transition.
|
||||
// The latter catches "pod was Ready, now being deleted" so the
|
||||
// reconciler withdraws the nexthop before the pod actually exits.
|
||||
if podAnycastEligible(oldP) != podAnycastEligible(newP) {
|
||||
pc.fireReady()
|
||||
}
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user