Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6062

OLM's catalogsource pods cause a need to manually force drain for a worker to reboot

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • 4.12
    • OLM
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • 1/31: telco review - pending severity

      Description of problem:

      When an OLM catalog source pod is scheduled to run on a node and that node is later drained it will result in the following error:
      
      There are pending nodes to be drained: worker-075
      error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-marketplace/webscale-operators-7jdpv
      
      (original report closed)
      https://github.com/operator-framework/operator-lifecycle-manager/issues/1514
      
      (closed report because duplicate)
      https://github.com/operator-framework/operator-lifecycle-manager/pull/2814
      
      Based on report history this issue has been swept under the rug for the past 2-3 years. 
      
      https://github.com/operator-framework/operator-lifecycle-manager/issues/2709
      
      The bug that we're encountering and documented in those github issues is that OLM's catalogsource pods are not managed by anything or one of the following resources  ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet
      
      The pods are created standalone and this could be addressed in future releases of operator-lifecycle-manager.  If this was the case the nodes would drain without requiring the use of --force.  Which is the expected and desired output.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Very

      Steps to Reproduce:

      1. Pod 'openshift-marketplace/webscale-operators-7jdpv' lives on worker
      2. Worker starts to drain in anticipation for reboot but gets stuck draining
      3. User has to manually force drain the worker 
         $ oc adm drain <node> --force --grace-period=0 --ignore-daemonsets --delete-emptydir-data --disable-eviction 

      Actual results:

      Node gets stuck waiting to drain

      Expected results:

      Node drains successfully and reboots

      Additional info:

       

              rh-ee-dfranz Daniel Franz
              rhn-support-acardena Albert Cardenas
              bruno andrade bruno andrade
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: