Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8404

Specifying non-existen secret for API namedCertificates renders inconsistent config and causes kube-apiserver crash-loop

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 4.14.0
    • 4.11
    • kube-apiserver
    • Moderate
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, kube-apiserver did not change to `Degraded=True` when an invalid secret name was specified for servingCertificate in namedCertificates. With this release, kube-apiserver now switches to `Degraded=True` and shows why the certificate was not accepted to allow for easier troubleshooting. (link:https://issues.redhat.com/browse/OCPBUGS-8404[*OCPBUGS-8404*])
      Show
      * Previously, kube-apiserver did not change to `Degraded=True` when an invalid secret name was specified for servingCertificate in namedCertificates. With this release, kube-apiserver now switches to `Degraded=True` and shows why the certificate was not accepted to allow for easier troubleshooting. (link: https://issues.redhat.com/browse/OCPBUGS-8404 [* OCPBUGS-8404 *])
    • Bug Fix
    • Done

      Description of problem:

      If a custom API server certificate is added as per documentation[1], but the secret name is wrong and points to a non-existing secret, the following happens:
      - The kube-apiserver config is rendered with some of the namedCertificates pointing to /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/
      - As the secret in apiserver/cluster object is wrong, no user-serving-cert-000 secret is generated, so the /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/ does not exist (and may be automatically removed if manually created).
      - The combination of the 2 points above causes kube-apiserver to start crash-looping because its config points to non-existent certificates.
      
      This is a cluster-kube-apiserver-operator, because it should validate that the specified secret exists and degrade and do nothing if it doesn't, not render inconsistent configuration.
      

      Version-Release number of selected component (if applicable):

      First found in 4.11.13, but also reproduced in the latest nightly build.
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Setup a named certificate pointing to a secret that doesn't exist.
      2.
      3.
      

      Actual results:

      Inconsistent configuration that points to non-existing secret. Kube API server pod crash-loop.
      

      Expected results:

      Cluster Kube API Server Operator to detect that the secret is wrong, do nothing and only report itself as degraded with meaningful message so the user can fix. No Kube API server pod crash-looping.
      

      Additional info:

      Once the kube-apiserver is broken, even if the apiserver/cluster object is fixed, it is usually needed to apply a manual workaround in the crash-looping master. An example of workaround that works is[2], even though that KB article was written for another bug with different root cause. 
      
      References:
      
      [1] - https://docs.openshift.com/container-platform/4.11/security/certificates/api-server.html#api-server-certificates
      [2] - https://access.redhat.com/solutions/4893641
      

            vrutkovs@redhat.com Vadim Rutkovsky
            rhn-support-palonsor Pablo Alonso Rodriguez
            Rahul Gangwar Rahul Gangwar
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: