Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58027

Graceful shutdown of compute nodes failed to complete the node drain step

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19.0
    • Node / Kubelet
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • Yes
    • None
    • None
    • None
    • OCP Node Sprint 276 (blue), OCP Node Sprint 277 (blue)
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The environment is an OpenShift Container Platform (OCP) 4.19 cluster with Container-native Virtualization (CNV)4.19 and OpenShift Data Foundation (ODF) 4.19 installed.CNV workloads have been deployed in the cluster.The graceful shutdown process was attempted by following the official documentation, but the node drain operation did not succeed as expected.Doc-https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/backup_and_restore/graceful-shutdown-cluster#graceful-shutdown-cluster 

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.Deploy an OpenShift 4.19 cluster with ODF and CNV 4.19 installed     
          2.Create CNV workloads:   
              - Deploy VMs using DataVolumes(DV) and DataVolumeTemplates(DVT).  
              - Create cloned VMs.   
              - Restore VMs from snapshots.     
          3.Gracefully stop all VM workloads prior to initiating the cluster     shutdown procedure.
          4.Perform graceful shutdown of compute nodes by following the steps outlined in the official documentation. 

      Actual results:

          The node drain process during graceful shutdown did not complete successfully. It gets hung while draining last compute node.

      Expected results:

      The node drain process during graceful shutdown should complete successfully.

      Additional info:

      Since all nodes are drained, there is no node for the primary to start on and due to the PDB, the node drain is blocked.
      Stopping the primary by putting the cnpg cluster into hibernation mode.
      oc annotate cluster noobaa-db-pg-cluster --overwrite cnpg.io/hibernation=on After setting this annotation, the cnpg cluster should not block the drain but still it blocks the drain of node. 

              qiwan233 Qi Wang
              rh-ee-asagare Avdhoot Sagare
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: