Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.12
Component/s: kube-apiserver
Labels:

Severity:
Important
Regression:
No
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Customer Impact:

Customer Escalated
Internal Whiteboard:
Latest Status Summary:
6/7: telco reviewed
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:
PX Impact Range:
PX Review Complete:
PX Technical Impact:

Description of problem:

When installing an vCU app, using a large helm chart, we are getting "http2: client connection lost"
In the logs of kube apiserver there are lot of "timeout or abort while handling" errors, the API become unresponsive, slowing down the cluster.

This behavior sometimes makes the helm install fail and it also made some nodes go into NotReady state because the API was unresponsive.

There are also errors in etcd like "took to long" messages even if the fio performance test is showing very good results, even during the helm install.

Customer tested with 4.8 claiming that in that version it does not happen.

Version-Release number of selected component (if applicable):

How reproducible:

In customer environment, when they install/uninstall their large helm chart.

Steps to Reproduce:

1.
2.
3.

Actual results:

The helm chart sometimes fail to install, it can also throw nodes into NotReady state due to API unresponsiveness.

Expected results:

Helm chart is installed successfully without hurting the cluster.

Additional info:

CU env is BareMetal IPI.

Installed operators:
NAME                                                      AGE
local-storage-operator.openshift-local-storage            7d
mcg-operator.openshift-storage                            7d
metallb-operator.metallb-system                           7d
ocs-operator.openshift-storage                            7d
odf-csi-addons-operator.openshift-storage                 7d
odf-operator.openshift-storage                            7d
sriov-network-operator.openshift-sriov-network-operator   7d

This is not a duplicate of OCPBUGS-2474, we are not getting any MissingStaticPodControllerDegraded error by cluster operators.

Assignee:: Abu H Kashem

Reporter:: Francesco Cristini

QA Contact:: Ke Wang

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2023/05/26 2:09 PM

Updated:: 2024/06/13 10:53 PM

Resolved:: 2023/06/08 11:16 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates