Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22237

Windows workload didn't update kubelet-ca.crt on the node after the certificate update in Openshift

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.16.0
    • 4.13.z
    • Windows Containers
    • None
    • No
    • 3
    • WINC - Sprint 246, WINC - Sprint 248, WINC - Sprint 249, WINC - Sprint 251, WINC - Sprint 253, WINC - Sprint 254
    • 6
    • False
    • Hide

      None

      Show
      None
    • Fixes an issue where the contents of the kubetl-ca.crt file on Windows nodes was not being populated correctly after the rotation of the kube-apiserver-to-kubelet-client-ca certificate.
    • Bug Fix
    • In Progress

      Description of problem:

      We updated our Selfsigned Certificate to a trusted CA in Openshift. Everything works fine Linux Nodes were updated and Router and API, too.
      
      After the change in the Windows kubelet logs, we see the following entry:
      ~~~
      w-amd-342-zw4l2 E0921 23:27:36.567178    1852 dynamic_cafile_content.go:237] key failed with : open C:\k\kubelet-ca.crt: The process cannot access the file because it is being used by another process.
      w-amd-342-zw4l2 I0921 23:27:36.569637    1852 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\k\\kubelet-ca.crt"
      w-amd-342-zw4l2 E0921 23:27:36.818376    1852 server.go:299] "Unable to authenticate the request due to an error" err="verifying certificate SN=294212011553307207576684339170806169001, SKID=, AKID=B4:69:72:1F:BA:7D:6F:de:9B:4A:A8:AA:58:56:77:87:29:22:CD:92 failed: x509: certificate signed by unknown authority"
      ~~~
      
      *Observations*:
      ===========
       We found the kubelet-ca.crt cert  is not updated on the older Windows Nodes as per our observations.
      
      To isolate the issue during the call with CU, we created a new Windows workload and found there was no error "x509: certificate signed by unknown authority" after the creation.
      
      So we checked the C:\k\kubelet-ca.crt files on both the old and new machines found differences and concluded kubelet-ca.crt was not updated on the older windows machine.
      
      After some time(around 2 hours) we found errors  "x509: certificate signed by unknown authority" it look like the Nodes will be created with the correct "kubelet-ca.crt" but a few hours later it will be updated to the old one.
      
      As a workaround, we download the C:\k\kubelet-ca.crt file from the new Windows machine and update the same on one of the old Windows workload machine. Then we rebooted the old Windows machine we found the error had disappeared for some time then again the changes are reverted back to the older kubelet-ca.crt
      
      This concludes C:\k\kubelet-ca.crt is not updated properly on the older Windows instance. Although the new Windows machine C:\k\kubelet-ca.crt is properly updated at initial time then it reverted back to older kubelet-ca.crt. 
      
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Everytime when a new windows node gets added into the cluster or re-creation of existing window node.

      Steps to Reproduce:

      1. Check contents of C:\k\kubelet-ca.crt on each Windows node
      
      2. Rotate the kube-apiserver-to-kubelet-signer cert
      $ oc patch  -n openshift-kube-apiserver-operator secret kube-apiserver-to-kubelet-signer --type='json' -p='[{"op": "replace", "path": "/metadata/annotations/auth.openshift.io~1certificate-not-after","value": null }]' 
      
      3. After a small amount of time check new contents of C:\k\kubelet-ca.crt on each Windows node.
       

      Actual results:

      After the change, they started seeing "x509: certificate signed by unknown authority" in the Windows kubelet logs:
      ~~~
      w-amd-342-zw4l2 E0921 23:27:36.567178    1852 dynamic_cafile_content.go:237] key failed with : open C:\k\kubelet-ca.crt: The process cannot access the file because it is being used by another process.
      w-amd-342-zw4l2 I0921 23:27:36.569637    1852 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\k\\kubelet-ca.crt"
      w-amd-342-zw4l2 E0921 23:27:36.818376    1852 server.go:299] "Unable to authenticate the request due to an error" err="verifying certificate SN=294212011553307207576684339170806169001, SKID=, AKID=B4:69:72:1F:BA:7D:6F:de:9B:4A:A8:AA:58:56:77:87:29:22:CD:92 failed: x509: certificate signed by unknown authority"
      ~~~
      The file c:\k\kubelet-ca.crt has either the same contents as before the rotation, or has some certs removed and new certs added.
      

      Expected results:

      The file c:\k\kubelet-ca.crt retains the certs present in it before the rotation, and contains new certs as well.
      

      Additional info:

      With first customer, this issue got triggered right after patching custom CA for API and Ingress. This could be coincidental because technically custom CA has nothing to do with kubelet certificates. Now we've another customer wherein such patching of ingress, api cert is not mentioned, still kubelet is failing to validate the CA cert.

              rh-ee-ssoto Sebastian Soto
              rhn-support-mbagga Mithilesh Bagga
              Aharon Rasouli Aharon Rasouli
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: