-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
4.12.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
None
-
MCO Sprint 241, MCO Sprint 242, MCO Sprint 243, MCO Sprint 244
-
4
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Machine-config operator found in degraded state with MachineConfigControllerFailed reason.
~~~
- lastTransitionTime: '2023-08-14T01:54:52Z'
message: 'Failed to resync 4.12.26 because: error during waitForControllerConfigToBeCompleted:
[timed out waiting for the condition, controllerconfig is not completed: ControllerConfig
has not completed: completed(false) running(false) failing(true)]'
reason: MachineConfigControllerFailed
status: 'True'
type: Degraded
~~~
machine-config-controller logs: Controller is failing to read directory
~~~
2023-08-14T04:29:47.125008119Z I0814 04:29:47.124988 1 render_controller.go:377] Error syncing machineconfigpool worker: ControllerConfig has not completed: completed(false) running(false) failing(true)
2023-08-14T04:30:03.142875064Z E0814 04:30:03.142826 1 template_controller.go:426] failed to read dir "/etc/mcc/templates": readdirent /etc/mcc/templates: no such file or directory
2023-08-14T04:30:03.142875064Z I0814 04:30:03.142851 1 template_controller.go:427] Dropping controllerconfig "machine-config-controller" out of the queue: failed to read dir "/etc/mcc/templates": readdirent /etc/mcc/templates: no such file or directory
2023-08-14T04:30:27.948653516Z I0814 04:30:27.948609 1 container_runtime_config_controller.go:364] Error syncing image config openshift-config: could not Create/Update MachineConfig: could not generate original ContainerRuntime Configs: generateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/master": readdirent /etc/mcc/templates/master: no such file or directory
~~~
Cu recently upgraded cluster from 4.10.61 to 4.12.26. On running cluster cu observed that ,machine-config, network and sample operator stuck in degraded state.
Issue resolved by deleting machine-config-controller controllerconfig, network and sample operator pods.
We are looking for RCA what caused the issue. Not performed pruning activity on cluster
Version-Release number of selected component (if applicable):
4.12.26
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: