Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33875

Windows workload didn't update kubelet-ca.crt on the node after the certificate update in Openshift

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.15.z
    • 4.13.z
    • Windows Containers
    • None
    • No
    • 0
    • WINC - Sprint 254, WINC - Sprint 255
    • 2
    • False
    • Hide

      None

      Show
      None
    • Previously, the contents of the kubetl-ca.crt file on Windows nodes was not being populated correctly after the rotation of the kube-apiserver-to-kubelet-client-ca certificate. This fix corrects this issue.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-22237. The following is the description of the original issue:

      Description of problem:

      We updated our Selfsigned Certificate to a trusted CA in Openshift. Everything works fine Linux Nodes were updated and Router and API, too.
      
      After the change in the Windows kubelet logs, we see the following entry:
      ~~~
      w-amd-342-zw4l2 E0921 23:27:36.567178    1852 dynamic_cafile_content.go:237] key failed with : open C:\k\kubelet-ca.crt: The process cannot access the file because it is being used by another process.
      w-amd-342-zw4l2 I0921 23:27:36.569637    1852 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\k\\kubelet-ca.crt"
      w-amd-342-zw4l2 E0921 23:27:36.818376    1852 server.go:299] "Unable to authenticate the request due to an error" err="verifying certificate SN=294212011553307207576684339170806169001, SKID=, AKID=B4:69:72:1F:BA:7D:6F:de:9B:4A:A8:AA:58:56:77:87:29:22:CD:92 failed: x509: certificate signed by unknown authority"
      ~~~
      
      *Observations*:
      ===========
       We found the kubelet-ca.crt cert  is not updated on the older Windows Nodes as per our observations.
      
      To isolate the issue during the call with CU, we created a new Windows workload and found there was no error "x509: certificate signed by unknown authority" after the creation.
      
      So we checked the C:\k\kubelet-ca.crt files on both the old and new machines found differences and concluded kubelet-ca.crt was not updated on the older windows machine.
      
      After some time(around 2 hours) we found errors  "x509: certificate signed by unknown authority" it look like the Nodes will be created with the correct "kubelet-ca.crt" but a few hours later it will be updated to the old one.
      
      As a workaround, we download the C:\k\kubelet-ca.crt file from the new Windows machine and update the same on one of the old Windows workload machine. Then we rebooted the old Windows machine we found the error had disappeared for some time then again the changes are reverted back to the older kubelet-ca.crt
      
      This concludes C:\k\kubelet-ca.crt is not updated properly on the older Windows instance. Although the new Windows machine C:\k\kubelet-ca.crt is properly updated at initial time then it reverted back to older kubelet-ca.crt. 
      
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Everytime when a new windows node gets added into the cluster or re-creation of existing window node.

      Steps to Reproduce:

      1. Check contents of C:\k\kubelet-ca.crt on each Windows node
      
      2. Rotate the kube-apiserver-to-kubelet-signer cert
      $ oc patch  -n openshift-kube-apiserver-operator secret kube-apiserver-to-kubelet-signer --type='json' -p='[{"op": "replace", "path": "/metadata/annotations/auth.openshift.io~1certificate-not-after","value": null }]' 
      
      3. After a small amount of time check new contents of C:\k\kubelet-ca.crt on each Windows node.
       

      Actual results:

      After the change, they started seeing "x509: certificate signed by unknown authority" in the Windows kubelet logs:
      ~~~
      w-amd-342-zw4l2 E0921 23:27:36.567178    1852 dynamic_cafile_content.go:237] key failed with : open C:\k\kubelet-ca.crt: The process cannot access the file because it is being used by another process.
      w-amd-342-zw4l2 I0921 23:27:36.569637    1852 dynamic_cafile_content.go:119] "Loaded a new CA Bundle and Verifier" name="client-ca-bundle::C:\\k\\kubelet-ca.crt"
      w-amd-342-zw4l2 E0921 23:27:36.818376    1852 server.go:299] "Unable to authenticate the request due to an error" err="verifying certificate SN=294212011553307207576684339170806169001, SKID=, AKID=B4:69:72:1F:BA:7D:6F:de:9B:4A:A8:AA:58:56:77:87:29:22:CD:92 failed: x509: certificate signed by unknown authority"
      ~~~
      The file c:\k\kubelet-ca.crt has either the same contents as before the rotation, or has some certs removed and new certs added.
      

      Expected results:

      The file c:\k\kubelet-ca.crt retains the certs present in it before the rotation, and contains new certs as well.
      

      Additional info:

      With first customer, this issue got triggered right after patching custom CA for API and Ingress. This could be coincidental because technically custom CA has nothing to do with kubelet certificates. Now we've another customer wherein such patching of ingress, api cert is not mentioned, still kubelet is failing to validate the CA cert.

            rh-ee-ssoto Sebastian Soto
            openshift-crt-jira-prow OpenShift Prow Bot
            Aharon Rasouli Aharon Rasouli
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: