Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18673

13 month OpenShift Self Signed Certificate rotation does not work

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • 4.12.z, 4.11.z
    • service-ca
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      I have been working with my customer on this issue, and this is what we have discovered:
      
      This problem has caused us a lot of pain. It has wasted a lot of our time. We have had outages to Loki, Quay and Noobaa, all running on OpenShift and all provided by Red Hat, due to this problem.
      
      Manual [1] in section -> https://docs.openshift.com/container-platform/4.12/security/certificates/service-serving-certificate.html#understanding-service-serving_service-serving-certificate
      
      Says
      
      "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left. After rotation, the previous service CA configuration is still trusted until its expiration. This allows a grace period for all affected services to refresh their key material before the expiration. If you do not upgrade your cluster during this grace period, which restarts services and refreshes their key material, you might need to manually restart services to avoid failures after the previous service CA expires."
      
      At the 13 month period the old 26 Month  certificate gets updated with a new certificate which is valid for 26 month period from 13 month rotation period. Unfortunately the old 26 month certificate (now 13 months old) gets lost. This means any service that does not automatically pickup the new 26 month certificate will break. To be clear just updating the secrets in the various projects that use this new certificate is not enough. The PODs running in those projects will still have the now old certificate in RAM unless they are restarted. 
      
      Using this query -> oc get secrets/signing-key -n openshift-service-ca -o template='{{index .data "tls.crt"}}' | base64 --decode
      -----BEGIN CERTIFICATE-----
      MIIDUTCCAjmgAwIBAgIIWMEl+2yguMYwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE
      Awwrb3BlbnNoaWZ0LXNlcnZpY2Utc2VydmluZy1zaWduZXJAMTY1OTExMTQzMDAe
      Fw0yMzA4MjgxNjE3MjdaFw0yNTEwMjYxNjE3MjhaMDYxNDAyBgNVBAMMK29wZW5z
      aGlmdC1zZXJ2aWNlLXNlcnZpbmctc2lnbmVyQDE2NTkxMTE0MzAwggEiMA0GCSqG
      SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDEzvQ+VySQK/k/0sKVdwN7J4E4OJ8h+9GC
      rDS38cLnYD3q6I/iC3ZoIZkkCkcbnHSc0/4Q/AKecXsb4pwI+9WPE5w2YQmtY6ey
      2VB6Bg1BYTLw65WsWmm0CjszjMFSxyn3spesKFlYuT8mepC9ynsSofUQFUrEHZk3
      YSq6sz24+KXIzCZS3k7ECGqKSyNZg30jBZmqa8cPAaws/zl9/U/rXP994qsNFruQ
      DcLO1IVHYl650oOT6zswNhlzZ311fNIbf0S8VzgVxiC+TQgQJ1NQar2NmpROMSgX
      Ybw6dFRxodkFfcNQAGcrqWlPCQTxlGGrl5GW5IKjkIYanw5szD9HAgMBAAGjYzBh
      MA4GA1UdDwEB/wQEAwICpDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBQeUF07
      Q3vpPq2XGFc1v9xEqPZqADAfBgNVHSMEGDAWgBQeUF07Q3vpPq2XGFc1v9xEqPZq
      ADANBgkqhkiG9w0BAQsFAAOCAQEAFgsXg4gciulG51Ls8W4mln4HDmYmrFLxwhZQ
      qhYr0pK8p+/WHJ6wjQueMuUK2DRBX1IKnOcz3FbLgTssHp11tBxadQotVCzvaD+g
      AV6njgdxIv4J0KIrONzMnlU31NkO9xRfXzyJHa6frZLxzIZ8glSiUY6U4q2Q6E9P
      /eUQeVxoDthTV4iYzWBS/R3rnNBloB+2PAKUDNyNfnDwcA6f+Q4k818eI8cnbyaz
      iumM/yE8V3pJfDdb1slZHEhEbR6T2DDDP7G0DOoCQ3sSbRwXQwSA2TRG/eVBBenZ
      SDQgReolRpbl5pntsGPmNfmnJv7Wqwaqi3yWZQuvz0wVaH8Ilg==
      -----END CERTIFICATE-----
      
      Reveals a single certificate so how can the previous certificate be valid and available for selection?

      Version-Release number of selected component (if applicable):

      OCP 4.11 and OCP 4.12

      How reproducible:

      100%

      Steps to Reproduce:

      1. Check the current cert for our example we are using Noobaa: openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'
      2. Cause an automatic certificate rotation
      3. When we recheck the cert openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' we can see the old cert has been removed
      4.
      

      Actual results:

      At the 13 month period the old 26 Month certificate gets updated with a new certificate which is valid for 26 month period from 13 month rotation period. Unfortunately the old 26 month certificate (now 13 months old) gets lost.

      Expected results:

      The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left. After rotation, the previous service CA configuration is still trusted until its expiration. This allows a grace period for all affected services to refresh their key material before the expiration. If you do not upgrade your cluster during this grace period, which restarts services and refreshes their key material, you might need to manually restart services to avoid failures after the previous service CA expires

      Additional info:

       

            slaznick@redhat.com Stanislav Láznička
            rhn-support-andbartl Andy Bartlett
            Xingxing Xia Xingxing Xia
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: