Uploaded image for project: 'Product Technical Learning'
  1. Product Technical Learning
  2. PTL-13218

DO380v4.14 : ch06s05 : The deployments that are dropped during this labs setup have a good chance of practically killing one of the nodes. It would be a good idea to dial down the amount of load dropped on it

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • DO380 - OCP4.14-en-1-20240220
    • DO380
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • 6
    • ROLE
    • en-US (English)

      Please fill in the following information:


      URL: https://rol.redhat.com/rol/app/courses/do380-4.14/pages/ch06s05
      Reporter RHNID: chetan-rhls
      Section Title:                  Lab: OpenShift Monitoring                                                       

      Issue description: The deployments that are dropped during this labs setup have a good chance of practically killing one of the nodes. It would be a good idea to dial down the amount of load dropped on it or changing what kind of alert is supposed to appear.  

      I’ve experienced this twice.
      Worker03 simply wasn’t able to deal with the load to a point where cluster operators on this node would not respond and node was marked as unavailable. Given time maybe it would recover, but I gave it 30m to no avail. In the end I force deleted the pods and restarted the node. The problem is that because the cluster lost the communication with the node it didn’t even produce the desired alert

       

      Steps to reproduce:

       

      Workaround:

       

      Expected result:

            rht-bgargallo Bernardo Andres Gargallo Jaquotot
            chetan-rhls Chetan Tiwary
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: