Commit Graph

2 Commits

Author SHA1 Message Date
Donavan Fritz e00579f7ca nodecondition: SSA the NetworkUnavailable condition (don't merge-patch)
Build flock Image / build (push) Has been cancelled
The previous implementation used JSON merge-patch (types.MergePatchType)
with a one-element conditions array. JSON merge-patch on arrays is
whole-array replacement, so every 60s flock-agent stomped over the
kubelet-managed conditions (Ready, MemoryPressure, DiskPressure,
PIDPressure), leaving only NetworkUnavailable on the node — until
kubelet's next status post (~5s later) re-set them.

Symptom: `kubectl get nodes` flickered, with one node briefly showing
Unknown each polling tick. k9s lit up red on rotating nodes. (kube-
controller-manager is also a write contender and was correctly noted
in the field-managers list.)

Switch to Server-Side Apply against the status subresource with
fieldManager=flock-agent and Force=true. NodeStatus.Conditions is a
listType=map keyed by `type`, so SSA merges by type — we declare
ownership of only the NetworkUnavailable entry and leave kubelet's
entries untouched. Force lets us reclaim the condition if a previous
CNI manager (e.g. calico-node finalizer leftovers) still owns it.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 08:55:03 -05:00
Donavan Fritz c7fb159632 agent: maintain NetworkUnavailable=False on owned nodes
Build flock Image / build (push) Has been cancelled
When Calico shuts down on a flock-labeled node, calico-node sets
NetworkUnavailable=True with reason CalicoIsDown. Nothing replaces it,
so kubelet's NodeController applies node.kubernetes.io/network-
unavailable:NoSchedule and new pods can't land.

flock-agent now patches Status.Conditions every 60s with
NetworkUnavailable=False (reason=FlockReady). RBAC: nodes/status patch.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 23:11:47 -05:00