-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15.0
-
Moderate
-
No
-
MCO Sprint 247, MCO Sprint 248, MCO Sprint 249, MCO Sprint 250, MCO Sprint 252, MCO Sprint 253
-
6
-
Rejected
-
False
-
-
Release Note Not Required
-
Done
-
This bug focuses on the /etc/docker/certs.d not found issue that is causing nodes to be marked degraded occasionally.
As a result of fixing https://issues.redhat.com/browse/OCPBUGS-19722 , I noticed a few additional logs in the controller where it was failing to get controllerconfig during cluster installation.
I1005 08:32:43.003013 1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found I1005 08:32:44.284624 1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found .I1005 08:32:46.735315 1 render_controller.go:377] Error syncing machineconfigpool master: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found I1005 08:32:46.735386 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found I1005 08:32:46.755690 1 render_controller.go:377] Error syncing machineconfigpool master: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found I1005 08:32:46.755751 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
I also noticed these on the daemon logs, but they seem to exist prior to the fix made in the above PR.
E1004 15:10:37.497119 12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory E1004 15:10:38.807323 12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory E1004 15:10:41.392855 12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory E1004 15:10:46.544369 12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory E1004 15:10:56.815668 12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
This manifests as the following in the controller:
I1005 08:32:54.162695 1 status.go:126] Degraded Machine: ip-10-0-89-70.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.162712 1 status.go:126] Degraded Machine: ip-10-0-1-133.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.162724 1 status.go:126] Degraded Machine: ip-10-0-60-194.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.174177 1 kubelet_config_features.go:118] Applied FeatureSet cluster on MachineConfigPool master
None of these seem fatal, they seem to show up in installation and go away as the installation completes. We may end up needing to do nothing as this could be a completely harmless timing issue, but it does seem worth taking a closer look at. I'll attach the full log to this bug.
- blocks
-
OCPBUGS-33643 Nodes being marked degraded due to /etc/docker/certs.d not being found
- Closed
- is blocked by
-
OCPBUGS-33412 Nodes being marked degraded due to /etc/docker/certs.d not being found
- Closed
- is cloned by
-
OCPBUGS-33412 Nodes being marked degraded due to /etc/docker/certs.d not being found
- Closed
-
OCPBUGS-33418 Investigate timing issues in machine-config-controller
- Closed
-
OCPBUGS-33643 Nodes being marked degraded due to /etc/docker/certs.d not being found
- Closed
- relates to
-
MCO-531 General "tech debt" items that don't have a home yet
- New
-
OCPBUGS-29284 Adding cloudCA certificate is taking too long in clusters with no capabilities enabled
- Closed
- links to
-
RHBA-2024:2865 OpenShift Container Platform 4.15.z bug fix update