Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33643

Nodes being marked degraded due to /etc/docker/certs.d not being found

XMLWordPrintable

    • Moderate
    • No
    • MCO Sprint 253
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • Done

      This is a clone of issue OCPBUGS-20152. The following is the description of the original issue:

      This bug focuses on the /etc/docker/certs.d not found issue that is causing nodes to be marked degraded occasionally.

       

      As a result of fixing https://issues.redhat.com/browse/OCPBUGS-19722 , I noticed a few additional logs in the controller where it was failing to get controllerconfig during cluster installation.

      I1005 08:32:43.003013 1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
      I1005 08:32:44.284624 1 container_runtime_config_controller.go:417] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
      .I1005 08:32:46.735315 1 render_controller.go:377] Error syncing machineconfigpool master: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
      I1005 08:32:46.735386 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
      I1005 08:32:46.755690 1 render_controller.go:377] Error syncing machineconfigpool master: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found
      I1005 08:32:46.755751 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found

      I also noticed these on the daemon logs, but they seem to exist prior to the fix made in the above PR.

      E1004 15:10:37.497119   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
      E1004 15:10:38.807323   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
      E1004 15:10:41.392855   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
      E1004 15:10:46.544369   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory
      E1004 15:10:56.815668   12299 writer.go:226] Marking Degraded due to: open /etc/docker/certs.d: no such file or directory 

      This manifests as the following in the controller:

       

      I1005 08:32:54.162695       1 status.go:126] Degraded Machine: ip-10-0-89-70.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.162712       1 status.go:126] Degraded Machine: ip-10-0-1-133.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.162724       1 status.go:126] Degraded Machine: ip-10-0-60-194.us-east-2.compute.internal and Degraded Reason: open /etc/docker/certs.d: no such file or directoryI1005 08:32:54.174177       1 kubelet_config_features.go:118] Applied FeatureSet cluster on MachineConfigPool master 

      None of these seem fatal, they seem to show up in installation and go away as the installation completes. We may end up needing to do nothing as this could be a completely harmless timing issue, but it does seem worth taking a closer look at. I'll attach the full log to this bug. 

            umohnani Urvashi Mohnani
            openshift-crt-jira-prow OpenShift Prow Bot
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: