Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-68336

When adding new nodes, MCD executes commands after setting the nodes' state as Done

XMLWordPrintable

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The MCD executes commands after it sets the node's state as Done.
          

      Version-Release number of selected component (if applicable):

      4.21
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Scale up a machineset to add a new node to the cluster
          2. Check the MCD logs for the new node
          
          

      Actual results:

      
      The MCO sets the node's status as Done, but after doing it MCD continues executing several commands, like rpm-ostree clean up, rpm-ostree kargs, and it seems that several ssh keys tasks too.
      
      This is the log
      
      
      {noformat}
      I1215 11:15:33.144605    2623 daemon.go:1773] Current+desired config: rendered-worker-01a6f09f7f4651703c4ea4be16e330c3
      I1215 11:15:33.144626    2623 daemon.go:1788] state: Done
      I1215 11:15:33.144658    2623 update.go:2710] Running: rpm-ostree cleanup -r
      Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
      Pruned images: 1 (layers: 51)
      Freed: 529.4?MB (pkgcache branches: 0)
      I1215 11:16:20.020840    2623 update.go:2755] "No bootstrap pivot required; unlinking bootstrap node annotations"
      I1215 11:16:20.023314    2623 daemon.go:2259] Validating against current config rendered-worker-01a6f09f7f4651703c4ea4be16e330c3
      I1215 11:16:20.023574    2623 daemon.go:2144] SSH key location update required. Moving SSH keys from "/home/core/.ssh/authorized_keys" to "/home/core/.ssh/authorized_keys.d/ignition".
      I1215 11:16:20.037202    2623 update.go:2306] updating SSH keys
      I1215 11:16:20.037389    2623 file_writers.go:359] Retrieved UserId: 1000 for username: core
      I1215 11:16:20.039812    2623 file_writers.go:369] Retrieved GroupID: 1000 for group: core
      I1215 11:16:20.039825    2623 update.go:2207] Writing SSH keys to "/home/core/.ssh/authorized_keys.d/ignition"
      I1215 11:16:20.039862    2623 update.go:2172] Creating missing SSH key dir at "/home/core/.ssh/authorized_keys.d"
      I1215 11:16:20.076576    2623 update.go:2241] Wrote SSH keys to "/home/core/.ssh/authorized_keys.d/ignition"
      I1215 11:16:20.076593    2623 command_runner.go:24] Running captured: rpm-ostree kargs
      I1215 11:16:20.136499    2623 update.go:2755] "Validated on-disk state"
      I1215 11:16:20.163923    2623 daemon.go:2368] System state unchanged: MachineConfig: rendered-worker-01a6f09f7f4651703c4ea4be16e330c3
      I1215 11:16:30.187992    2623 update.go:2755] "Update completed for config rendered-worker-01a6f09f7f4651703c4ea4be16e330c3 and node has been successfully uncordoned"
      
      {noformat}
      
      
      
          

      Expected results:

      
      If the node's state is "Done", we don't expect the MCD to be executing tasks.
      
          

      Additional info:

      
      When we remove a node (scale down) a terminate signal is sent to the MCD, and if the MCD is running an rpm-ostree command it will temporarily degrade the pool.
      
      What we do in our testing is that we wait for the node to be Done before removing the nodes, but the node is not actually done, and it executes the cleanup command, leading to test instability when we scale down the node.
      
          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: