Loading...

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: odf-4.19
Affects Version/s: odf-4.19
Component/s: management-console
Labels:
- advanced-internal

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Dev Approval:
Committed
Docs Approval:
?
Architecture:

s390x
PM Approval:
?
Prod build version:
4.19.0-64.konflux
QE Approval:
Committed
Release Note Type:
Release Note Not Required
Target Release:

odf-4.19

Target Version:

odf-4.19

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

rook-ceph-rgw-ocs-storagecluster-cephobjectstore pod crashes with ODF v4.19.0-75 deployment on IBM Z with host networking enabled

The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

IBM Z , Baremetal

The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

Internal Mode (Converged Provider and Internal mode)

The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

OCP: 4.19.0-ec.4

ODF: v4.19.0-75

Does this issue impact your ability to continue to work with the product?

Yes

Is there any workaround available to the best of your knowledge?

No

Can this issue be reproduced? If so, please provide the hit rate

Yes

Can this issue be reproduced from the UI?

Yes

If this is a regression, please provide more details to justify this:

Steps to Reproduce:

1. Deploy OCP 4.19.0-ec.4

2. Deploy LSO and ODF v4.19.0-75

3. Update the odf-operator csv .spec.provider to IBM for the Converged Internal and Provider mode deployment

4. Create Storage system with Host networking option

The exact date and time when the issue was observed, including timezone details:

Actual results:

rook-ceph-rgw-ocs-storagecluster-cephobjectstore is in CLB state

Expected results:

rook-ceph-rgw-ocs-storagecluster-cephobjectstore pod should be up and Running

Logs collected and log location:

# oc logs rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-59d874f8jhn9 -f
Defaulted container "rgw" out of: rgw, log-collector, chown-container-data-dir (init)
+ exec radosgw --crush-location=host=worker-1-a3e18001-lnxero1-boe --keyring=/etc/ceph/keyring-store/keyring --default-log-to-stderr=true --default-err-to-stderr=true --default-mon-cluster-log-to-stderr=true '--default-log-stderr-prefix=debug ' --default-log-to-file=false --default-mon-cluster-log-to-file=false '--mon-host=[v2:172.23.235.17:3300],[v2:172.23.235.16:3300],[v2:172.23.235.15:3300]' --mon-initial-members=a,b,c --id=rgw.ocs.storagecluster.cephobjectstore.a --setuser=ceph --setgroup=ceph --foreground '--rgw-frontends=beast port=80 ssl_port=443 ssl_certificate=/etc/ceph/private/rgw-cert.pem ssl_private_key=/etc/ceph/private/rgw-key.pem' --rgw-mime-types-file=/etc/ceph/rgw/mime.types --rgw-realm=ocs-storagecluster-cephobjectstore --rgw-zonegroup=ocs-storagecluster-cephobjectstore --rgw-zone=ocs-storagecluster-cephobjectstore --rados-replica-read-policy=localize
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 deferred set uid:gid to 167:167 (ceph:ceph)
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 ceph version 19.2.1-120.el9cp (9d9d735fbda3c9cca21e066e3d8238ee9520d682) squid (stable), process radosgw, pid 4436
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 framework: beast
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 framework conf key: port, val: 80
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 framework conf key: ssl_port, val: 443
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 framework conf key: ssl_certificate, val: /etc/ceph/private/rgw-cert.pem
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  0 framework conf key: ssl_private_key, val: /etc/ceph/private/rgw-key.pem
debug 2025-04-17T14:09:43.690+0000 3ffb1a5ab00  1 init_numa not setting numa affinity
debug 2025-04-17T14:09:45.220+0000 3fe7a69f800  1 v1 topic migration: starting v1 topic migration..
debug 2025-04-17T14:09:45.220+0000 3fe7a69f800  1 v1 topic migration: finished v1 topic migration
debug 2025-04-17T14:09:45.280+0000 3ffb1a5ab00 -1 LDAP not started since no server URIs were provided in the configuration.
debug 2025-04-17T14:09:45.280+0000 3ffb1a5ab00  1 rgw main: Lua ERROR: failed to find luarocks
debug 2025-04-17T14:09:45.320+0000 3ffb1a5ab00  0 framework: beast
debug 2025-04-17T14:09:45.320+0000 3ffb1a5ab00  0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
debug 2025-04-17T14:09:45.320+0000 3ffb1a5ab00  0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
debug 2025-04-17T14:09:45.340+0000 3ffb1a5ab00  0 starting handler: beast
debug 2025-04-17T14:09:45.340+0000 3ffb1a5ab00 -1 failed to bind address 0.0.0.0:443: Address already in use
debug 2025-04-17T14:09:45.340+0000 3ffb1a5ab00 -1 ERROR: failed initializing frontend
debug 2025-04-17T14:09:45.340+0000 3ffb1a5ab00 -1 ERROR:  initialize frontend fail, r = 98

Additional info:

Unable to provide must-gather logs due to the following error, although I was able to collect the logs previously, pls let me know if you need any specific logs

Error running must-gather collection: gather did not start for pod must-gather-jjbcn: unable to pull image: ImagePullBackOff: Back-off pulling image "quay.io/rhceph-dev/ocs-must-gather:latest-4.19": ErrImagePull: [rpc error: code = Canceled desc = copying system image from manifest list: copying config: context canceled, initializing source docker://quay.io/rhceph-dev/ocs-must-gather:latest-4.19: reading manifest latest-4.19 in quay.io/rhceph-dev/ocs-must-gather: unauthorized: access to the requested resource is not authorized]