Uploaded image for project: 'OpenShift API Server'
  1. OpenShift API Server
  2. API-1687

Impact cert issues after 4.14 to 4.15 upgrade

XMLWordPrintable

    • Improvement
    • 1
    • False
    • None
    • False

      Impact statement for the OCPBUGS-25821 series and the OCPBUGS-31384 series.

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      Updates for clusters born in 4.7 or earlier from 4.14 to 4.15 until resolved.

      Which types of clusters?

      All clusters originally installed on OpenShift Container Platform (OCP) version 4.6 or earlier. Clusters installed on 4.7 or later and new 4.15 installs are unaffected.

      Note, due to history pruning in the CVO we cannot reliably detect born in versions less than 4.9. Therefore the conditional update rule will be updated to omit update recommendations in all clusters born in 4.8 and earlier.

      What is the impact? Is it serious enough to warrant removing update recommendations?

      The Kubernetes API server operator will begin rolling a number of certificates, including the CA that signs api-int. Missing pieces in the pipeline to roll out the new CA to kubelets and other consumers lead the cluster to lock up when the Kubernetes API servers transition to using the new cert/CA pair when serving incoming requests. For example, nodes may go NotReady with kubelets unable to call in their status to an api-int signed by the new CA that they don't yet trust.

      How involved is remediation?

      We recovered a bitten cluster by:

      1. Manually injecting the new cert from openshift-config-managed's kube-apiserver-server-ca Secret into /var/lib/kubelet/kubeconfig on the control-plane nodes, and then restarting their kubelets with systemctl restart kubelet. This got the kubelets talking to api-int again.
      2. Deleting some pods in openshift-machine-api and maybe openshift-machine-config-operator that had been stuck in ContainerCreating. Their replacements came up smoothly.
      3. Deleting all the compute Machines to force reprovisions with working certs.

      Is this a regression?

      Yes, in OCP 4.15.

            vrutkovs@redhat.com Vadim Rutkovsky
            trking W. Trevor King
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: