Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58116

Cluster doesn't come up after cert rotation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 4.19.z
    • 4.19.0
    • kube-apiserver
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      Previously, the kubelet server cert was not updated after cert rotation due to unauthorized access to the API server causing the cluster to start in an unhealthy state. With this release, the kubelet server cert is updated after cert rotation, ensuring a healthy cluster state. (link:https://issues.redhat.com/browse/OCPBUGS-58116[OCPBUGS-58116])
      Show
      Previously, the kubelet server cert was not updated after cert rotation due to unauthorized access to the API server causing the cluster to start in an unhealthy state. With this release, the kubelet server cert is updated after cert rotation, ensuring a healthy cluster state. (link: https://issues.redhat.com/browse/OCPBUGS-58116 [ OCPBUGS-58116 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-56551. The following is the description of the original issue:

      Description of problem:

         I am trying out 4.19.0-rc.2 bits and looks like after cert rotation kubelet server cert is not updated and cluster doesn't come up in healthy state

      Version-Release number of selected component (if applicable):

      $ openshift-install version
      openshift-install 4.19.0-rc.2
      built from commit fcbbb9444f0001a8b704fdc0cdf85a459290271d
      release image quay.io/openshift-release-dev/ocp-release@sha256:596f4d804654419241c1894fb6d54066718f254aab58dfa8892bb26390ba3df9
      release architecture amd64
      

      How reproducible:

          Every single time when try to force cert rotation.

      Steps to Reproduce:

          1. Manually put system back for a day (to get cert rotation)
          2. Provision the cluster using openshift install
          3. Restart the system back in present and wait for cert rotation kicks in
          

      Actual results:

      $ journalctl -u kubelet
      [...]
      May 21 03:57:11 crc kubenswrapper[2837]: E0521 03:57:11.439229    2837 controller.go:145] "Failed to ensure lease exists, will retry" err="Unauthorized" interval="7s"
      May 21 03:57:11 crc kubenswrapper[2837]: I0521 03:57:11.836112    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:11 crc kubenswrapper[2837]: E0521 03:57:11.845286    2837 eviction_manager.go:292] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"crc\" not found"
      May 21 03:57:12 crc kubenswrapper[2837]: I0521 03:57:12.836648    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:13 crc kubenswrapper[2837]: I0521 03:57:13.836289    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:14 crc kubenswrapper[2837]: I0521 03:57:14.835712    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:15 crc kubenswrapper[2837]: I0521 03:57:15.836484    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:16 crc kubenswrapper[2837]: I0521 03:57:16.836607    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:17 crc kubenswrapper[2837]: I0521 03:57:17.835684    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:18 crc kubenswrapper[2837]: I0521 03:57:18.169296    2837 kubelet_node_status.go:413] "Setting node annotation to enable volume controller attach/detach"
      May 21 03:57:18 crc kubenswrapper[2837]: I0521 03:57:18.170542    2837 kubelet_node_status.go:736] "Recording event message for node" node="crc" event="NodeHasSufficientMemory"
      May 21 03:57:18 crc kubenswrapper[2837]: I0521 03:57:18.170671    2837 kubelet_node_status.go:736] "Recording event message for node" node="crc" event="NodeHasNoDiskPressure"
      May 21 03:57:18 crc kubenswrapper[2837]: I0521 03:57:18.170686    2837 kubelet_node_status.go:736] "Recording event message for node" node="crc" event="NodeHasSufficientPID"
      May 21 03:57:18 crc kubenswrapper[2837]: I0521 03:57:18.170710    2837 kubelet_node_status.go:78] "Attempting to register node" node="crc"
      May 21 03:57:18 crc kubenswrapper[2837]: E0521 03:57:18.177260    2837 kubelet_node_status.go:110] "Unable to register node with API server" err="Unauthorized" node="crc"
      May 21 03:57:18 crc kubenswrapper[2837]: E0521 03:57:18.445338    2837 controller.go:145] "Failed to ensure lease exists, will retry" err="Unauthorized" interval="7s"
      May 21 03:57:18 crc kubenswrapper[2837]: I0521 03:57:18.834898    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:19 crc kubenswrapper[2837]: I0521 03:57:19.836862    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:20 crc kubenswrapper[2837]: I0521 03:57:20.004614    2837 kubelet_node_status.go:413] "Setting node annotation to enable volume controller attach/detach"
      May 21 03:57:20 crc kubenswrapper[2837]: I0521 03:57:20.005764    2837 kubelet_node_status.go:736] "Recording event message for node" node="crc" event="NodeHasSufficientMemory"
      May 21 03:57:20 crc kubenswrapper[2837]: I0521 03:57:20.005820    2837 kubelet_node_status.go:736] "Recording event message for node" node="crc" event="NodeHasNoDiskPressure"
      May 21 03:57:20 crc kubenswrapper[2837]: I0521 03:57:20.005846    2837 kubelet_node_status.go:736] "Recording event message for node" node="crc" event="NodeHasSufficientPID"May 21 03:57:20 crc kubenswrapper[2837]: E0521 03:57:20.006175    2837 kubelet.go:3205] "No need to create a mirror pod, since failed to get node info from the cluster" err="node \"crc\" not found" node="crc"
      May 21 03:57:20 crc kubenswrapper[2837]: I0521 03:57:20.839764    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:21 crc kubenswrapper[2837]: I0521 03:57:21.837837    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      May 21 03:57:21 crc kubenswrapper[2837]: E0521 03:57:21.846350    2837 eviction_manager.go:292] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"crc\" not found"
      May 21 03:57:22 crc kubenswrapper[2837]: I0521 03:57:22.836404    2837 csi_plugin.go:887] Failed to contact API server when waiting for CSINode publishing: Unauthorized
      
      
      $ oc get csr
      <nothing in pending>

      Expected results:

          cert rotations should be successful

      Additional info:

      To test locally on linux system or any GCP instance (which have nested virtualization enabled) can use https://github.com/crc-org/snc (release-4.19) branch.
      
      ```
      $ ./snc.sh
      ```

              vrutkovs@redhat.com Vadim Rutkovsky
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: