-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16.z, 4.20.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
x86_64
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
A customer discovered that when they update their custom ingress certificate, they are no longer able to login to their cluster. Using the kubeconfig file, they discovered that the v4-0-config-system-router-certs secret in the openshift-authentication namespace had a corrupted certificate. Specifically, the tls.crt and tls.key ended up on the same line:
...
hX0461AR1eKD6thzHXO2D8DUzSP4JdWiC6n3yyJvNCLJ3z7fSqe8Y8UvyxBoa1rI
EwB10Cd7DAGBUXQOfpv0JpbKb4GuSVxNv2bHl5op5I3N6C3vJBSZA/jzsHYVX8jF
T3GASQnmVEGgqeC8lpawMw==
-----END CERTIFICATE----------BEGIN CERTIFICATE-----
MIIDfzCCAmegAwIBAgIIHNpb3HsmtbwwDQYJKoZIhvcNAQELBQAwJjEkMCIGA1UE
AwwbaW5ncmVzcy1vcGVyYXRvckAxNzYxNTc0ODY0MB4XDTI1MTAyNzE0MjEwNVoX
...
They determined this is a result of the tls.crt in the custom ingress cert did not end in a newline character. Adding a newline to the tls.crt key in the custom ingress secret caused the v4-0-config-system-router-certs to be updated correctly.
Version-Release number of selected component (if applicable):
Customer is running 4.16, but I was able to recreate the issue in a 4.20.0 lab environment.
How reproducible:
It is easiest to reproduce in the Console. Just update the tls.crt in the custom ingress secret (or the router-certs-default secret if there is no custom cert) and remove the newline character from the end of the secret. This will roll out a new router cert secret that will prevent authentication from working.
Steps to Reproduce:
1. Remove the newline character from the tls.crt key in the custom (or default) ingress certificate secret
2. Wait a few moments for the updated secret to be rolled into the openshift-authentication router secret called v4-0-config-system-router-certs.
3. Note that the v4-0-config-system-router-certs has the "END CERTIFICATE" and "BEGIN CERTIFICATE" on the same line and that the authentication ClusterOperator has an error:
"OAuthServerDeploymentAvailable: no oauth-openshift.openshift-authentication pods available on any node...."
Actual results:
The openshift-authentication CO has an error.
Expected results:
The newline character should be added so that the openshift-authentication router cert secret is not corrupted.
Additional info:
The customer did not open a case since they know the workaround and I was able to recreate the issue very easily in a lab cluster.