-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
As the MCD is currently written, it spawns a few goroutines independent of the main syncNode code. These goroutines do things such as monitor for SSH logins, monitor the Kubelet healthz endpoint, nodewriter, metrics listener, etc. These all accept a stopCh parameter which appears to be used for shutting down that goroutine. It also appears that the intent is that if that channel is closed, all listening goroutines will shut down.
However, this is not what is actually happening:
- In cmd/machine-config-daemon/start.go, we create a stop channel and pass that into several things that depend on that channel: https://github.com/openshift/machine-config-operator/blob/master/cmd/machine-config-daemon/start.go#L181
- We also pass stopCh into the Daemon when we start it (https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/daemon.go#L605) and, indeed, we have a stopCh field on the Daemon struct (https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/daemon.go#L100-L101). However, this field on the Daemon struct is never initialized ($ rg -g "!*_test.go" -F "stopCh" ./pkg/daemon). This compiles because nil channels (that is, channels that were never initialized (make(chan struct{})) is still valid Golang: https://go.dev/play/p/7LCBAkgrFbn. However, if anything were to try to close dn.stopCh, the MCD would panic.
At a minimum, we need to make sure that dn.stopCh is initialized with the stopCh from cmd/machine-config-daemon/start.go. However, I think the preferred solution would be to use contexts as they are a more idiomatic way of controlling multiple goroutines (https://go.dev/play/p/rl8BP7BHMGV (notice that all of the goroutines that are spawned eventually shut down).
- is related to
-
MCO-1154 Clean up MCD goroutine channel handling
- To Do