-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12.z, 4.11.z
-
None
-
No
-
False
-
Description of problem:
I have been working with my customer on this issue, and this is what we have discovered: This problem has caused us a lot of pain. It has wasted a lot of our time. We have had outages to Loki, Quay and Noobaa, all running on OpenShift and all provided by Red Hat, due to this problem. Manual [1] in section -> https://docs.openshift.com/container-platform/4.12/security/certificates/service-serving-certificate.html#understanding-service-serving_service-serving-certificate Says "The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left. After rotation, the previous service CA configuration is still trusted until its expiration. This allows a grace period for all affected services to refresh their key material before the expiration. If you do not upgrade your cluster during this grace period, which restarts services and refreshes their key material, you might need to manually restart services to avoid failures after the previous service CA expires." At the 13 month period the old 26 Month certificate gets updated with a new certificate which is valid for 26 month period from 13 month rotation period. Unfortunately the old 26 month certificate (now 13 months old) gets lost. This means any service that does not automatically pickup the new 26 month certificate will break. To be clear just updating the secrets in the various projects that use this new certificate is not enough. The PODs running in those projects will still have the now old certificate in RAM unless they are restarted. Using this query -> oc get secrets/signing-key -n openshift-service-ca -o template='{{index .data "tls.crt"}}' | base64 --decode -----BEGIN CERTIFICATE----- MIIDUTCCAjmgAwIBAgIIWMEl+2yguMYwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE Awwrb3BlbnNoaWZ0LXNlcnZpY2Utc2VydmluZy1zaWduZXJAMTY1OTExMTQzMDAe Fw0yMzA4MjgxNjE3MjdaFw0yNTEwMjYxNjE3MjhaMDYxNDAyBgNVBAMMK29wZW5z aGlmdC1zZXJ2aWNlLXNlcnZpbmctc2lnbmVyQDE2NTkxMTE0MzAwggEiMA0GCSqG SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDEzvQ+VySQK/k/0sKVdwN7J4E4OJ8h+9GC rDS38cLnYD3q6I/iC3ZoIZkkCkcbnHSc0/4Q/AKecXsb4pwI+9WPE5w2YQmtY6ey 2VB6Bg1BYTLw65WsWmm0CjszjMFSxyn3spesKFlYuT8mepC9ynsSofUQFUrEHZk3 YSq6sz24+KXIzCZS3k7ECGqKSyNZg30jBZmqa8cPAaws/zl9/U/rXP994qsNFruQ DcLO1IVHYl650oOT6zswNhlzZ311fNIbf0S8VzgVxiC+TQgQJ1NQar2NmpROMSgX Ybw6dFRxodkFfcNQAGcrqWlPCQTxlGGrl5GW5IKjkIYanw5szD9HAgMBAAGjYzBh MA4GA1UdDwEB/wQEAwICpDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBQeUF07 Q3vpPq2XGFc1v9xEqPZqADAfBgNVHSMEGDAWgBQeUF07Q3vpPq2XGFc1v9xEqPZq ADANBgkqhkiG9w0BAQsFAAOCAQEAFgsXg4gciulG51Ls8W4mln4HDmYmrFLxwhZQ qhYr0pK8p+/WHJ6wjQueMuUK2DRBX1IKnOcz3FbLgTssHp11tBxadQotVCzvaD+g AV6njgdxIv4J0KIrONzMnlU31NkO9xRfXzyJHa6frZLxzIZ8glSiUY6U4q2Q6E9P /eUQeVxoDthTV4iYzWBS/R3rnNBloB+2PAKUDNyNfnDwcA6f+Q4k818eI8cnbyaz iumM/yE8V3pJfDdb1slZHEhEbR6T2DDDP7G0DOoCQ3sSbRwXQwSA2TRG/eVBBenZ SDQgReolRpbl5pntsGPmNfmnJv7Wqwaqi3yWZQuvz0wVaH8Ilg== -----END CERTIFICATE----- Reveals a single certificate so how can the previous certificate be valid and available for selection?
Version-Release number of selected component (if applicable):
OCP 4.11 and OCP 4.12
How reproducible:
100%
Steps to Reproduce:
1. Check the current cert for our example we are using Noobaa: openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' 2. Cause an automatic certificate rotation 3. When we recheck the cert openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' we can see the old cert has been removed 4.
Actual results:
At the 13 month period the old 26 Month certificate gets updated with a new certificate which is valid for 26 month period from 13 month rotation period. Unfortunately the old 26 month certificate (now 13 months old) gets lost.
Expected results:
The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left. After rotation, the previous service CA configuration is still trusted until its expiration. This allows a grace period for all affected services to refresh their key material before the expiration. If you do not upgrade your cluster during this grace period, which restarts services and refreshes their key material, you might need to manually restart services to avoid failures after the previous service CA expires
Additional info: