-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
Our "Updating" condition is currently distilled from a set of other Node conditions which unfortunately include things potentially outside of the MCO's control (like DiskPressure, Unscheduleable, etc). We need to factor those conditions out unless we are responsible for them.
With 4.12 we have controller cordons/drains, which leave node annotations on the Node object, e.g.:
machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-561b9f700f58ed5ff139246f8d9a5b3c machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-561b9f700f58ed5ff139246f8d9a5b3c
We can use these to figure out whether or not we're actually in the process of doing anything rather than carrying our old assumption of "Unavailable nodes means we're updating" because a node could be cordoned by all sorts of things that aren't the MCO.
I could see this resulting in another condition to capture "I have an obstacle to updating these nodes, but they have not experienced a failure and are not degraded".