-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.12
-
Important
-
No
-
Rejected
-
False
-
-
Customer Escalated
-
-
6/7: telco reviewed
-
Description of problem:
When installing an vCU app, using a large helm chart, we are getting "http2: client connection lost" In the logs of kube apiserver there are lot of "timeout or abort while handling" errors, the API become unresponsive, slowing down the cluster. This behavior sometimes makes the helm install fail and it also made some nodes go into NotReady state because the API was unresponsive. There are also errors in etcd like "took to long" messages even if the fio performance test is showing very good results, even during the helm install. Customer tested with 4.8 claiming that in that version it does not happen.
Version-Release number of selected component (if applicable):
How reproducible:
In customer environment, when they install/uninstall their large helm chart.
Steps to Reproduce:
1. 2. 3.
Actual results:
The helm chart sometimes fail to install, it can also throw nodes into NotReady state due to API unresponsiveness.
Expected results:
Helm chart is installed successfully without hurting the cluster.
Additional info:
CU env is BareMetal IPI. Installed operators: NAME AGE local-storage-operator.openshift-local-storage 7d mcg-operator.openshift-storage 7d metallb-operator.metallb-system 7d ocs-operator.openshift-storage 7d odf-csi-addons-operator.openshift-storage 7d odf-operator.openshift-storage 7d sriov-network-operator.openshift-sriov-network-operator 7d This is not a duplicate of OCPBUGS-2474, we are not getting any MissingStaticPodControllerDegraded error by cluster operators.