-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.12.z, 4.11.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
I have been working with my customer on this issue, and this is what we have discovered:
This problem has caused us a lot of pain. It has wasted a lot of our time. We have had outages to Loki, Quay and Noobaa, all running on OpenShift and all provided by Red Hat, due to this problem.
Manual [1] in section -> https://docs.openshift.com/container-platform/4.12/security/certificates/service-serving-certificate.html#understanding-service-serving_service-serving-certificate
Says
"The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left. After rotation, the previous service CA configuration is still trusted until its expiration. This allows a grace period for all affected services to refresh their key material before the expiration. If you do not upgrade your cluster during this grace period, which restarts services and refreshes their key material, you might need to manually restart services to avoid failures after the previous service CA expires."
At the 13 month period the old 26 Month certificate gets updated with a new certificate which is valid for 26 month period from 13 month rotation period. Unfortunately the old 26 month certificate (now 13 months old) gets lost. This means any service that does not automatically pickup the new 26 month certificate will break. To be clear just updating the secrets in the various projects that use this new certificate is not enough. The PODs running in those projects will still have the now old certificate in RAM unless they are restarted.
Using this query -> oc get secrets/signing-key -n openshift-service-ca -o template='{{index .data "tls.crt"}}' | base64 --decode
-----BEGIN CERTIFICATE-----
MIIDUTCCAjmgAwIBAgIIWMEl+2yguMYwDQYJKoZIhvcNAQELBQAwNjE0MDIGA1UE
Awwrb3BlbnNoaWZ0LXNlcnZpY2Utc2VydmluZy1zaWduZXJAMTY1OTExMTQzMDAe
Fw0yMzA4MjgxNjE3MjdaFw0yNTEwMjYxNjE3MjhaMDYxNDAyBgNVBAMMK29wZW5z
aGlmdC1zZXJ2aWNlLXNlcnZpbmctc2lnbmVyQDE2NTkxMTE0MzAwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDEzvQ+VySQK/k/0sKVdwN7J4E4OJ8h+9GC
rDS38cLnYD3q6I/iC3ZoIZkkCkcbnHSc0/4Q/AKecXsb4pwI+9WPE5w2YQmtY6ey
2VB6Bg1BYTLw65WsWmm0CjszjMFSxyn3spesKFlYuT8mepC9ynsSofUQFUrEHZk3
YSq6sz24+KXIzCZS3k7ECGqKSyNZg30jBZmqa8cPAaws/zl9/U/rXP994qsNFruQ
DcLO1IVHYl650oOT6zswNhlzZ311fNIbf0S8VzgVxiC+TQgQJ1NQar2NmpROMSgX
Ybw6dFRxodkFfcNQAGcrqWlPCQTxlGGrl5GW5IKjkIYanw5szD9HAgMBAAGjYzBh
MA4GA1UdDwEB/wQEAwICpDAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBQeUF07
Q3vpPq2XGFc1v9xEqPZqADAfBgNVHSMEGDAWgBQeUF07Q3vpPq2XGFc1v9xEqPZq
ADANBgkqhkiG9w0BAQsFAAOCAQEAFgsXg4gciulG51Ls8W4mln4HDmYmrFLxwhZQ
qhYr0pK8p+/WHJ6wjQueMuUK2DRBX1IKnOcz3FbLgTssHp11tBxadQotVCzvaD+g
AV6njgdxIv4J0KIrONzMnlU31NkO9xRfXzyJHa6frZLxzIZ8glSiUY6U4q2Q6E9P
/eUQeVxoDthTV4iYzWBS/R3rnNBloB+2PAKUDNyNfnDwcA6f+Q4k818eI8cnbyaz
iumM/yE8V3pJfDdb1slZHEhEbR6T2DDDP7G0DOoCQ3sSbRwXQwSA2TRG/eVBBenZ
SDQgReolRpbl5pntsGPmNfmnJv7Wqwaqi3yWZQuvz0wVaH8Ilg==
-----END CERTIFICATE-----
Reveals a single certificate so how can the previous certificate be valid and available for selection?
Version-Release number of selected component (if applicable):
OCP 4.11 and OCP 4.12
How reproducible:
100%
Steps to Reproduce:
1. Check the current cert for our example we are using Noobaa: openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' 2. Cause an automatic certificate rotation 3. When we recheck the cert openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' we can see the old cert has been removed 4.
Actual results:
At the 13 month period the old 26 Month certificate gets updated with a new certificate which is valid for 26 month period from 13 month rotation period. Unfortunately the old 26 month certificate (now 13 months old) gets lost.
Expected results:
The service CA certificate, which issues the service certificates, is valid for 26 months and is automatically rotated when there is less than 13 months validity left. After rotation, the previous service CA configuration is still trusted until its expiration. This allows a grace period for all affected services to refresh their key material before the expiration. If you do not upgrade your cluster during this grace period, which restarts services and refreshes their key material, you might need to manually restart services to avoid failures after the previous service CA expires
Additional info: