Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: ACM 2.9.0
Affects Version/s: ACM 2.9.0
Component/s: Installer
Labels:
- Train-05

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
Installer Sprint 23-14
Severity:
Critical
Customer Impact:

Customer Escalated

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

MCE is in an Error phase after what appears to have been an upgrade to ACM 2.6.6 and MCE 2.1.7. MCE resource is reporting:

rpc error: code = Unknown desc = malformed header: missing HTTP content-type

since July 14th. On July 24th the MCE operator was re-installed following the steps at https://access.redhat.com/solutions/6459071 but MCE has not recovered. Looking at the MCE operator pod logs we see repeating stream errors such as:

2023-07-24T17:36:58.288825855Z 1.6902202182887614e+09 DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-multicluster-openshift-io-v1-multiclusterengine", "code": 200, "reason": "", "UID": "beee0c21-e368-465a-99d9-b7b8da16b1be", "allowed": true}
2023-07-24T17:37:01.897230172Z W0724 17:37:01.897170 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262: failed to list *v1.ConfigMap: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1463; INTERNAL_ERROR; received from peer
2023-07-24T17:37:01.897300566Z I0724 17:37:01.897233 1 trace.go:205] Trace[590526907]: "Reflector ListAndWatch" name:sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262 (24-Jul-2023 17:36:00.718) (total time: 61178ms):
2023-07-24T17:37:01.897300566Z Trace[590526907]: ---"Objects listed" error:stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1463; INTERNAL_ERROR; received from peer 61178ms (17:37:01.897)
2023-07-24T17:37:01.897300566Z Trace[590526907]: [1m1.178279099s] [1m1.178279099s] END
2023-07-24T17:37:01.897300566Z E0724 17:37:01.897251 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: stream error when reading response body, may be caused by closed connection. Please retry. Original error: stream error: stream ID 1463; INTERNAL_ERROR; received from peer

The shift support team investigated due to concerns over the cluster health and reported:

etcd response rate for the cluster is pretty bad, and they seem to have an issue with volumes for ODF, but I don't see any signs of an issue with cluster health. All nodes are ready, minimum specs met, MCP up to date, all pods Ready or Completed.

Version-Release number of selected component (if applicable):

ACM 2.6.6 / MCE 2.1.7

How reproducible:

Have not seen in lab

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

is cloned by

ACM-6911 [ACM-2.8] MCE in Error Phase after upgrade to ACM 2.6.6 MCE 2.1.7

Closed

links to

RHEA-2023:115260 Red Hat Advanced Cluster Management 2.9.0 bug fixes and container updates

Assignee:: Jakob Gray

Reporter:: James Young

QA Contact:: Ting Xue

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/07/25 6:30 PM

Updated:: 2023/11/29 9:10 PM

Resolved:: 2023/09/07 7:06 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates