Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15853

Specifying non-existen secret for API namedCertificates renders inconsistent config and causes kube-apiserver crash-loop

XMLWordPrintable

    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-8404. The following is the description of the original issue:

      Description of problem:

      If a custom API server certificate is added as per documentation[1], but the secret name is wrong and points to a non-existing secret, the following happens:
      - The kube-apiserver config is rendered with some of the namedCertificates pointing to /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/
      - As the secret in apiserver/cluster object is wrong, no user-serving-cert-000 secret is generated, so the /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/ does not exist (and may be automatically removed if manually created).
      - The combination of the 2 points above causes kube-apiserver to start crash-looping because its config points to non-existent certificates.
      
      This is a cluster-kube-apiserver-operator, because it should validate that the specified secret exists and degrade and do nothing if it doesn't, not render inconsistent configuration.
      

      Version-Release number of selected component (if applicable):

      First found in 4.11.13, but also reproduced in the latest nightly build.
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Setup a named certificate pointing to a secret that doesn't exist.
      2.
      3.
      

      Actual results:

      Inconsistent configuration that points to non-existing secret. Kube API server pod crash-loop.
      

      Expected results:

      Cluster Kube API Server Operator to detect that the secret is wrong, do nothing and only report itself as degraded with meaningful message so the user can fix. No Kube API server pod crash-looping.
      

      Additional info:

      Once the kube-apiserver is broken, even if the apiserver/cluster object is fixed, it is usually needed to apply a manual workaround in the crash-looping master. An example of workaround that works is[2], even though that KB article was written for another bug with different root cause. 
      
      References:
      
      [1] - https://docs.openshift.com/container-platform/4.11/security/certificates/api-server.html#api-server-certificates
      [2] - https://access.redhat.com/solutions/4893641
      

            rhn-support-palonsor Pablo Alonso Rodriguez
            openshift-crt-jira-prow OpenShift Prow Bot
            Ke Wang Ke Wang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: