Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: odf-4.16
Component/s: rook
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Bugzilla Bug:
RHBZ: 2293058
Dev Approval:
?
QE Approval:
?
Release Note Type:
If docs needed, set a value
Target Release:

odf-4.18
Intelligence Requested:
Market:

Release Blocker:
Proposed

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:
PodDisruptionBudgetAtLimit errors reported and ceph cluster is unhealthy

Version-Release number of selected component (if applicable):
rook-ceph-operator-stable-4.16-odf-catalogsource-openshift-marketplace
rook-ceph-operator.v4.16.0-118.stable
Installed version
ClusterServiceVersion
CSV
odf-operator.v4.16.0-118.stable
Starting version
odf-operator.v4.16.0-94.stable

registry.redhat.io/rhceph/rhceph-7-rhel9@sha256:17e899c9c4f2f64bc7acea361446a64927b829d6766e6dde42f8d0336b9125a4

How reproducible:
Ongoing status reported

Steps to Reproduce:
1. I have a Regional DR Openshift Virtualization managed cluster as part of the Regional DR environment.
2.Following an ODF upgrade where the MCO operator reconciles the VeleroNamespaceSecretKeyRef and CACertificates fields as reported in bz https://bugzilla.redhat.com/show_bug.cgi?id=2277941
3. I reconfigured the CACertificates
4. After this noticed that that the ceph cluster was reported as not healthy reporting PodDisruptionBudgetLimit errors:

PodDisruptionBudgetLimit
Jun 7, 2024, 11:09 PM
The pod disruption budget is below the minimum disruptions allowed level and is not satisfied. The number of current healthy pods is less than the desired healthy pods.

The pod disruption budget is below the minimum disruptions allowed level and is not satisfied. The number of current healthy pods is less than the desired healthy pods.

Summary
The pod disruption budget registers insufficient amount of pods.

Actual results:
Ceph cluster is reported as not Healthy

Expected results:
Should be healthy

Additional info:
Data Foundation events reported:
failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd-virtualization": rpc error: code = DeadlineExceeded desc = context deadline exceeded

Ceph pods in CLBO status:
+ child_pid=
+ sigterm_received=false
+ trap sigterm SIGTERM
+ child_pid=1035934
+ wait 1035934
+ ceph-osd -~~foreground --id 2 --fsid c48929dd-4981-4e8a-b7b1-03751eb8eba3 --setuser ceph --setgroup ceph '~~crush-location=root=default host=rdr-c2-gxmhx-worker-0-blw77 zone=nova' --osd-op-num-threads-per-shard=2 --osd-op-num-shards=8 --osd-recovery-sleep=0 --osd-snap-trim-sleep=0 --osd-delete-sleep=0 --bluestore-min-alloc-size=4096 --bluestore-prefer-deferred-size=0 --bluestore-compression-min-blob-size=8192 --bluestore-compression-max-blob-size=65536 --bluestore-max-blob-size=65536 --bluestore-cache-size=3221225472 --bluestore-throttle-cost-per-io=4000 --bluestore-deferred-batch-ops=16 --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true '-default-log-stderr-prefix=debug ' --default-log-to-file=false --default-mon-cluster-log-to-file=false --ms-learn-addr-from-peer=false --public-addr=242.1.255.251 --public-bind-addr=10.131.0.41 --cluster-addr=10.131.0.41
debug 2024-06-19T10:22:41.500+0000 7f9c87c007c0 0 monclient(hunting): authenticate timed out after 300
failed to fetch mon config (--no-mon-config to skip)
+ wait 1035934
+ ceph_osd_rc=1
+ '[' 1 -eq 0 ']'
+ exit 1

Assignee:: Santosh Pillai

Reporter:: Kevin Alon Goldblatt

QA Contact:: Neha Berry

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Created:: 2024/06/19 10:53 AM

Updated:: 2024/11/25 2:28 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty