Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62177

Control Plane PKI Operator Sets Certificate Revoked Before KAS Changes Propagated

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The control-plane-pki-operator incorrectly reports certificate revoked condition.  it does not wait for each individual kube-apiserver pod to reload their CA bundles, therefore a certificate may be used for a short period of time, even after the pki-operator reports it unusable.  
      
      Typically this is on the order of ~minutes.  
      
          

      Version-Release number of selected component (if applicable):

      
          

      How reproducible:

          Race condition, so intermittently
          

      Steps to Reproduce:

          1. Create an HA control plane
          2. Request a breakglass certificate,
          3. Create a certificate revocation request, wait until the CRR condition for PreviousCertificateRevoked is true
          4. Attempt to use the certificate to authenticate to the KAS, it will intermittently succeed until the CA bundles reload on the KAS, depending on the routing.  
          

      Actual results:

          Certificate is valid intermittently until the CA bundles are reloaded on the KAS
          

      Expected results:

          When the status is reported, the certificate is no longer valid across all KAS instances. 
      
          

      Additional info:

          Logs for this scenario: https://drive.google.com/file/d/1_9U5lQh8Ejb36VqFhUbTSIYqusOfueqZ/view?usp=drive_link
          

      Acceptance criteria:
      Certificate revoked condition is true when:
      a) enumerate the API servers
      b) create kubeconfig for each
      c) if the connection matches the existing logic, move on to the next one
      d) if not, reuse existing logic for requeue

              sminonne@redhat.com Salvatore Dario Minonne
              bvesel.openshift Ben Vesel
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: