Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-453

What should the MCP condition "updating" mean?

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Won't Do
    • Icon: Minor Minor
    • None
    • None
    • None
    • False
    • None
    • False
    • OCPSTRAT-763 - [TechPreview]Disconnected Cluster Update and Boot without local image registry
    • 0
    • 0

      Background:

       

      The MachineConfig pool currently has a condition of "updating" that depends on whether or not a machine happens to be cordoned (among other things).

      The logic that decides this is here: https://github.com/openshift/machine-config-operator/blob/5cc821eb953c85764c2a092d53aaae34e1f1ac17/pkg/controller/node/status.go#L77

      allUpdated := updatedMachineCount == machineCount &&        
                		readyMachineCount == machineCount &&            
                		unavailableMachineCount == 0

       

      And if you chase all those states back through the code you end up with more or less:

      state Per-Node Logic That Decides When We're In This State Notes
      done currentConfig == desiredConfig  AND  MCD state is "Done"  
      updated done AND currentConfig == pool.Spec.Configuration.Name  
      ready NodeReady AND !NodeDiskPressure AND !NodeNetworkUnavailable AND !Unschedulable disk pressure doesn't really surface anywhere, so it's kind of sneaky
       
      unavailable !ready OR ( !done AND (Degraded OR Unreconcilable) )  

       

      So, you can see, if a node becomes "Unschedulable" for any reason (even if the MCO didn't do it):

       

      This is customer facing in the "oc output" for a machineconfig pool, and "updating" also gets set as a condition, so we look like we're updating when we're really not.

      UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      True      False      False      3              3                   3                     0                      3d22h
      False     True       False      4              3                   4                     0                      3d22h 

      We need to find a way to tell the truth.

      Goal:

      • What should a pool state of "updating" mean in this context?

      Should it mean:

      • At least one machine-config-daemon is working?
      • A new desiredConfig has been applied to at least one node and it hasn't been reconciled?
      • There is at least one machine in the pool that hasn't been updated completely yet? 
      • "Everything isn't done yet, so therefore I should be updating" (vs. "I am updating")
      • Other?

       

              Unassigned Unassigned
              jkyros@redhat.com John Kyros
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: