-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.13.z
-
None
-
No
-
0
-
WINC - Sprint 254, WINC - Sprint 255
-
2
-
False
-
-
Previously, the contents of the kubetl-ca.crt file on Windows nodes was not being populated correctly after the rotation of the kube-apiserver-to-kubelet-client-ca certificate. This fix corrects this issue.
-
Bug Fix
-
In Progress
This is a clone of issue OCPBUGS-22237. The following is the description of the original issue:
—
Description of problem:
We updated our Selfsigned Certificate to a trusted CA in Openshift. Everything works fine Linux Nodes were updated and Router and API, too. After the change in the Windows kubelet logs, we see the following entry: ~~~ w-amd-342-zw4l2 E0921 23:27:36.567178 1852 dynamic_cafile_content.go:237] key failed with : open C:\k\kubelet-ca.crt: The process cannot access the file because it is being used by another process. w-amd-342-zw4l2 I0921 23:27:36.569637 1852 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\k\\kubelet-ca.crt" w-amd-342-zw4l2 E0921 23:27:36.818376 1852 server.go:299] "Unable to authenticate the request due to an error" err="verifying certificate SN=294212011553307207576684339170806169001, SKID=, AKID=B4:69:72:1F:BA:7D:6F:de:9B:4A:A8:AA:58:56:77:87:29:22:CD:92 failed: x509: certificate signed by unknown authority" ~~~ *Observations*: =========== We found the kubelet-ca.crt cert is not updated on the older Windows Nodes as per our observations. To isolate the issue during the call with CU, we created a new Windows workload and found there was no error "x509: certificate signed by unknown authority" after the creation. So we checked the C:\k\kubelet-ca.crt files on both the old and new machines found differences and concluded kubelet-ca.crt was not updated on the older windows machine. After some time(around 2 hours) we found errors "x509: certificate signed by unknown authority" it look like the Nodes will be created with the correct "kubelet-ca.crt" but a few hours later it will be updated to the old one. As a workaround, we download the C:\k\kubelet-ca.crt file from the new Windows machine and update the same on one of the old Windows workload machine. Then we rebooted the old Windows machine we found the error had disappeared for some time then again the changes are reverted back to the older kubelet-ca.crt This concludes C:\k\kubelet-ca.crt is not updated properly on the older Windows instance. Although the new Windows machine C:\k\kubelet-ca.crt is properly updated at initial time then it reverted back to older kubelet-ca.crt.
Version-Release number of selected component (if applicable):
How reproducible:
Everytime when a new windows node gets added into the cluster or re-creation of existing window node.
Steps to Reproduce:
1. Check contents of C:\k\kubelet-ca.crt on each Windows node 2. Rotate the kube-apiserver-to-kubelet-signer cert $ oc patch -n openshift-kube-apiserver-operator secret kube-apiserver-to-kubelet-signer --type='json' -p='[{"op": "replace", "path": "/metadata/annotations/auth.openshift.io~1certificate-not-after","value": null }]' 3. After a small amount of time check new contents of C:\k\kubelet-ca.crt on each Windows node.
Actual results:
After the change, they started seeing "x509: certificate signed by unknown authority" in the Windows kubelet logs: ~~~ w-amd-342-zw4l2 E0921 23:27:36.567178 1852 dynamic_cafile_content.go:237] key failed with : open C:\k\kubelet-ca.crt: The process cannot access the file because it is being used by another process. w-amd-342-zw4l2 I0921 23:27:36.569637 1852 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\k\\kubelet-ca.crt" w-amd-342-zw4l2 E0921 23:27:36.818376 1852 server.go:299] "Unable to authenticate the request due to an error" err="verifying certificate SN=294212011553307207576684339170806169001, SKID=, AKID=B4:69:72:1F:BA:7D:6F:de:9B:4A:A8:AA:58:56:77:87:29:22:CD:92 failed: x509: certificate signed by unknown authority" ~~~ The file c:\k\kubelet-ca.crt has either the same contents as before the rotation, or has some certs removed and new certs added.
Expected results:
The file c:\k\kubelet-ca.crt retains the certs present in it before the rotation, and contains new certs as well.
Additional info:
With first customer, this issue got triggered right after patching custom CA for API and Ingress. This could be coincidental because technically custom CA has nothing to do with kubelet certificates. Now we've another customer wherein such patching of ingress, api cert is not mentioned, still kubelet is failing to validate the CA cert.
- blocks
-
OCPBUGS-35572 Windows workload didn't update kubelet-ca.crt on the node after the certificate update in Openshift
- Closed
- clones
-
OCPBUGS-22237 Windows workload didn't update kubelet-ca.crt on the node after the certificate update in Openshift
- Closed
- is blocked by
-
OCPBUGS-22237 Windows workload didn't update kubelet-ca.crt on the node after the certificate update in Openshift
- Closed
- is cloned by
-
OCPBUGS-35572 Windows workload didn't update kubelet-ca.crt on the node after the certificate update in Openshift
- Closed
- links to
-
RHBA-2024:132259 Red Hat OpenShift for Windows Containers 10.15.3 product release
- mentioned on