Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-26924

Enable healthcheck of stale node-registration sockets

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.16.0
    • 4.15.0
    • Storage / Operators
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Following up from OCPBUGS-16357, we should enable health check of stale registration sockets in our operators.

      We will need - https://github.com/kubernetes-csi/node-driver-registrar/pull/322 and we will have to enable healthcheck for registration sockets - https://github.com/kubernetes-csi/node-driver-registrar#example

            [OCPBUGS-26924] Enable healthcheck of stale node-registration sockets

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:0041

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Critical: OpenShift Container Platform 4.16.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:0041

            Wei Duan added a comment -

            It's not an answer to backport plan, I just checked the csi-node-driver-registrar in 4.15, it contains the fix (https://github.com/kubernetes-csi/node-driver-registrar/pull/322), so if you use the third party CSI driver, we need modify the CSI node configuration. 
            And the fix doesn't go to 4.14 yet. 

             

            Wei Duan added a comment - It's not an answer to backport plan, I just checked the csi-node-driver-registrar in 4.15, it contains the fix ( https://github.com/kubernetes-csi/node-driver-registrar/pull/322), so if you use the third party CSI driver, we need modify the CSI node configuration.  And the fix doesn't go to 4.14 yet.   

            Wei Duan added a comment - - edited

            Checked the drivers (with the latest csi-node-driver-registrar (v2.10.0)): 

            1. Issue (https://issues.redhat.com/browse/OCPBUGS-16357?focusedId=24091784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-24091784) doesn't reproduce
            2.  We do see container restarted thanks to the healthcheck.
            vmware-vsphere-csi-driver-node-rkk6g                    3/3     Running   4 (1s ago)    3h55m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none>
            vmware-vsphere-csi-driver-node-rkk6g                    2/3     Error     4 (31s ago)   3h55m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none>
            vmware-vsphere-csi-driver-node-rkk6g                    2/3     CrashLoopBackOff   4 (6s ago)    3h55m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none>
            vmware-vsphere-csi-driver-node-rkk6g                    3/3     Running            6 (3s ago)    3h56m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none>

            Marking as Verified.

            Wei Duan added a comment - - edited Checked the drivers (with the latest csi-node-driver-registrar (v2.10.0)):  Issue ( https://issues.redhat.com/browse/OCPBUGS-16357?focusedId=24091784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-24091784 ) doesn't reproduce  We do see container restarted thanks to the healthcheck. vmware-vsphere-csi-driver-node-rkk6g                    3/3     Running   4 (1s ago)    3h55m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none> vmware-vsphere-csi-driver-node-rkk6g                    2/3     Error     4 (31s ago)   3h55m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none> vmware-vsphere-csi-driver-node-rkk6g                    2/3     CrashLoopBackOff   4 (6s ago)    3h55m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none> vmware-vsphere-csi-driver-node-rkk6g                    3/3     Running            6 (3s ago)    3h56m   10.19.46.231   wduan-vsphere-khgdl-worker-2-r5629   <none>           <none> Marking as Verified.

            Hi hekumar@redhat.com,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi hekumar@redhat.com , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the "Target Backport Versions" field to indicate which version(s) will receive the fix.

            OpenShift Jira Bot added a comment - Looks like this bug is far enough along in the workflow that a code fix is ready. Customers and support need to know the backport plan. Please complete the " Target Backport Versions " field to indicate which version(s) will receive the fix.

            It has to be enabled in all the individual driver and operators

            Hemant Kumar added a comment - It has to be enabled in all the individual driver and operators

              hekumar@redhat.com Hemant Kumar
              hekumar@redhat.com Hemant Kumar
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: