-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
8
-
False
-
-
False
-
No Docs Impact
-
rhos-conplat-core-operators
-
None
-
Release Note Not Required
-
-
-
Low
https://github.com/openstack-k8s-operators/watcher-operator/pull/162 adds following poc job to test EDPM content on CentOS Stream 10
- nodeset: name: centos-9-medium-2x-centos-9-crc-cloud-ocp-4-18-1-xxl-vexxhost nodes: - name: controller label: cloud-centos-9-stream-tripleo-vexxhost-medium - name: compute-0 label: cloud-centos-9-stream-tripleo-vexxhost - name: compute-1 label: cloud-centos-9-stream-tripleo-vexxhost - name: crc label: crc-cloud-ocp-4-18-1-xxl groups: - name: computes nodes: - compute-0 - compute-1 - name: ocps nodes: - crc- job: name: podified-multinode-edpm-deployment-crc-2comp-cs10 parent: podified-multinode-edpm-deployment-crc-2comp nodeset: centos-9-medium-2x-centos-9-crc-cloud-ocp-4-18-1-xxl-vexxhost vars: cifmw_update_containers_openstack: true cifmw_update_containers_use_valkey: true cifmw_update_containers_org: podified-master-centos10 cifmw_update_containers_registry: quay.rdoproject.org cifmw_update_containers_tag: 0e75cb30c06f5bce6a42ee75c7be5c50 cifmw_update_containers: true cifmw_extras: - "@{{ ansible_user_dir }}/{{ zuul.projects['github.com/openstack-k8s-operators/ci-framework']. src_dir }}/scenarios/centos-9/multinode-ci.yml" - "@{{ ansible_user_dir }}/{{ zuul.projects['github.com/openstack-k8s-operators/ci-framework']. src_dir }}/scenarios/centos-9/horizon.yml" - "@{{ ansible_user_dir }}/{{ zuul.projects['github.com/openstack-k8s-operators/watcher-operator']. src_dir }}/ci/scenarios/edpm.yml"
The control plane deployment failed with `cinder-scheduler-0` going into `CrashLoopBackOff` state.
After checking the cinder-scheduler-0 pod log, we found following error.
025-05-14 05:25:13.864 1 ERROR oslo.messaging._drivers.impl_rabbit [None req-415dae18-0836-4781-8d8d-4dd1a8595d7f - - - - - -] Connection failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010) (retrying in 1.0 seconds): ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010)
Then we take a look at rabbitmq-server pod log, we found following error:
[38;5;87m2025-05-14 01:32:57.881294-04:00 [notice] <0.9124.0> TLS server: In state wait_finished at tls_record_1_3.erl:213 generated SERVER ALERT: Fatal - Bad Record MAC[0m [38;5;87m2025-05-14 01:32:57.881294-04:00 [notice] <0.9124.0> - {record_type_mismatch,21}
It uses `quay.rdoproject.org/podified-master-centos10/openstack-rabbitmq:0e75cb30c06f5bce6a42ee75c7be5c50` container which installs `rabbitmq-server x86_64 3.13.7` and `{color:#c01343}erlang-asn1 x86_64 26.2.5-1.el10{color}` from dlrn master deps repo in tcib job.
Based on discussion with Luca Miccini on slack thread ,
It may happened due to bogus certificate or maybe centos10 enforces some rsa/dsa stuff and that doesn't play well with rabbit. It requires a reproducer to reproduce it.
Note: If we deploy the controlplane with tls disabled. the controlplane deployment succeeded.
Below is the reproducer for the same on CentOS Stream 9 install_yamls dev box
git clone https://github.com/openstack-k8s-operators/install_yamls.git cd install_yamls/devsetup make download_tools cd install_yamls/devsetup CPUS=12 MEMORY=25600 DISK=100 make crc eval $(crc oc-env) oc login -u kubeadmin -p 12345678 https://api.crc.testing:6443 make crc_attach_default_interface cd .. make crc_storage make input make openstack make openstack_init # Add quay.rdoproject.org to insecure registry oc patch --type=merge --patch='{"spec": {"registrySources": {"insecureRegistries": ["quay.rdoproject.org"]}}}' image.config.openshift.io/cluster oc patch --type=merge --patch='{"spec": {"registrySources": {"allowedRegistries": ["quay.rdoproject.org","quay.io","gcr.io","registry.redhat.io","image-registry.openshift-image-registry.svc:5000"]}}}' image.config.openshift.io/cluster # Download the attached update_containers.yml file to use cs10 master containers oc apply -f update_containers.yml make openstack_deploy # wait for 20 mins tills cinder-scheduler-0 and rabbitmq-server pods are running. # check the logs of both pods, you can find the relevant error message.
- depends on
-
OSPRH-16663 Run Master EDPM job on CentOS Stream 10
-
- Closed
-
- links to