Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14158

Helm is receiving "http2: client connection lost" message form the API while installing a large helm chart

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated
    • 6/7: telco reviewed

      Description of problem:

      When installing an vCU app, using a large helm chart, we are getting "http2: client connection lost"
      In the logs of kube apiserver there are lot of "timeout or abort while handling" errors, the API become unresponsive, slowing down the cluster.
      
      This behavior sometimes makes the helm install fail and it also made some nodes go into NotReady state because the API was unresponsive.
      
      There are also errors in etcd like "took to long" messages even if the fio performance test is showing very good results, even during the helm install.
      
      Customer tested with 4.8 claiming that in that version it does not happen.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      In customer environment, when they install/uninstall their large helm chart.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      The helm chart sometimes fail to install, it can also throw nodes into NotReady state due to API unresponsiveness.

      Expected results:

      Helm chart is installed successfully without hurting the cluster.

      Additional info:

      CU env is BareMetal IPI.
      
      Installed operators:
      NAME                                                      AGE
      local-storage-operator.openshift-local-storage            7d
      mcg-operator.openshift-storage                            7d
      metallb-operator.metallb-system                           7d
      ocs-operator.openshift-storage                            7d
      odf-csi-addons-operator.openshift-storage                 7d
      odf-operator.openshift-storage                            7d
      sriov-network-operator.openshift-sriov-network-operator   7d
      
      This is not a duplicate of OCPBUGS-2474, we are not getting any MissingStaticPodControllerDegraded error by cluster operators.

              akashem@redhat.com Abu H Kashem
              fcristin1@redhat.com Francesco Cristini
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: