Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24274

API server restart during the stress-ng load testing resulting that cluster was inaccessible

    XMLWordPrintable

Details

    • Critical
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      • When running the 12h stability test with stress-ng with 60% load the API-server was restarted and never recover there for the all the stress pods stay running and continue loading the cluster. From the console it was visible that cgroup oom happend. Master0 and 2 was not possible to access via SSH. Master 1 was reachable but could not execute any oc-command. After restart master 2 via KVM it was possible to delete stress deployment. however master 0 and 2 needed to restart again that cluster started to be more stabile again.

      • it was confirmed that the 4.14.4 set being used was identical to the GA version.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhn-support-vismishr Vishvranjan Mishra
            Xingxing Xia Xingxing Xia
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: