-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.21
-
None
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The MCD executes commands after it sets the node's state as Done.
Version-Release number of selected component (if applicable):
4.21
How reproducible:
Always
Steps to Reproduce:
1. Scale up a machineset to add a new node to the cluster
2. Check the MCD logs for the new node
Actual results:
The MCO sets the node's status as Done, but after doing it MCD continues executing several commands, like rpm-ostree clean up, rpm-ostree kargs, and it seems that several ssh keys tasks too.
This is the log
{noformat}
I1215 11:15:33.144605 2623 daemon.go:1773] Current+desired config: rendered-worker-01a6f09f7f4651703c4ea4be16e330c3
I1215 11:15:33.144626 2623 daemon.go:1788] state: Done
I1215 11:15:33.144658 2623 update.go:2710] Running: rpm-ostree cleanup -r
Bootloader updated; bootconfig swap: yes; bootversion: boot.1.1, deployment count change: -1
Pruned images: 1 (layers: 51)
Freed: 529.4?MB (pkgcache branches: 0)
I1215 11:16:20.020840 2623 update.go:2755] "No bootstrap pivot required; unlinking bootstrap node annotations"
I1215 11:16:20.023314 2623 daemon.go:2259] Validating against current config rendered-worker-01a6f09f7f4651703c4ea4be16e330c3
I1215 11:16:20.023574 2623 daemon.go:2144] SSH key location update required. Moving SSH keys from "/home/core/.ssh/authorized_keys" to "/home/core/.ssh/authorized_keys.d/ignition".
I1215 11:16:20.037202 2623 update.go:2306] updating SSH keys
I1215 11:16:20.037389 2623 file_writers.go:359] Retrieved UserId: 1000 for username: core
I1215 11:16:20.039812 2623 file_writers.go:369] Retrieved GroupID: 1000 for group: core
I1215 11:16:20.039825 2623 update.go:2207] Writing SSH keys to "/home/core/.ssh/authorized_keys.d/ignition"
I1215 11:16:20.039862 2623 update.go:2172] Creating missing SSH key dir at "/home/core/.ssh/authorized_keys.d"
I1215 11:16:20.076576 2623 update.go:2241] Wrote SSH keys to "/home/core/.ssh/authorized_keys.d/ignition"
I1215 11:16:20.076593 2623 command_runner.go:24] Running captured: rpm-ostree kargs
I1215 11:16:20.136499 2623 update.go:2755] "Validated on-disk state"
I1215 11:16:20.163923 2623 daemon.go:2368] System state unchanged: MachineConfig: rendered-worker-01a6f09f7f4651703c4ea4be16e330c3
I1215 11:16:30.187992 2623 update.go:2755] "Update completed for config rendered-worker-01a6f09f7f4651703c4ea4be16e330c3 and node has been successfully uncordoned"
{noformat}
Expected results:
If the node's state is "Done", we don't expect the MCD to be executing tasks.
Additional info:
When we remove a node (scale down) a terminate signal is sent to the MCD, and if the MCD is running an rpm-ostree command it will temporarily degrade the pool.
What we do in our testing is that we wait for the node to be Done before removing the nodes, but the node is not actually done, and it executes the cleanup command, leading to test instability when we scale down the node.