Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-16670

TLS server: In state wait_finished at tls_record_1_3.erl:213 generated SERVER ALERT: Fatal - Bad Record MAC error in rabbitmq-server pod in CentOS Stream 10 EDPM POC job

XMLWordPrintable

    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • No Docs Impact
    • rhos-conplat-core-operators
    • None
    • Release Note Not Required
    • Low

      https://github.com/openstack-k8s-operators/watcher-operator/pull/162 adds following poc job to test EDPM content on CentOS Stream 10

      - nodeset:
          name: centos-9-medium-2x-centos-9-crc-cloud-ocp-4-18-1-xxl-vexxhost
          nodes:
            - name: controller
              label: cloud-centos-9-stream-tripleo-vexxhost-medium
            - name: compute-0
              label: cloud-centos-9-stream-tripleo-vexxhost
            - name: compute-1
              label: cloud-centos-9-stream-tripleo-vexxhost
            - name: crc
              label: crc-cloud-ocp-4-18-1-xxl
          groups:
            - name: computes
              nodes:
                - compute-0
                - compute-1
            - name: ocps
              nodes:
                - crc- job:
          name: podified-multinode-edpm-deployment-crc-2comp-cs10
          parent: podified-multinode-edpm-deployment-crc-2comp
          nodeset: centos-9-medium-2x-centos-9-crc-cloud-ocp-4-18-1-xxl-vexxhost
          vars:
            cifmw_update_containers_openstack: true
            cifmw_update_containers_use_valkey: true
            cifmw_update_containers_org: podified-master-centos10
            cifmw_update_containers_registry: quay.rdoproject.org
            cifmw_update_containers_tag: 0e75cb30c06f5bce6a42ee75c7be5c50
            cifmw_update_containers: true
            cifmw_extras:
              - "@{{ ansible_user_dir }}/{{ zuul.projects['github.com/openstack-k8s-operators/ci-framework'].
                 src_dir }}/scenarios/centos-9/multinode-ci.yml"
              - "@{{ ansible_user_dir }}/{{ zuul.projects['github.com/openstack-k8s-operators/ci-framework'].
                 src_dir }}/scenarios/centos-9/horizon.yml"
              - "@{{ ansible_user_dir }}/{{ zuul.projects['github.com/openstack-k8s-operators/watcher-operator'].
                 src_dir }}/ci/scenarios/edpm.yml"

      The control plane deployment failed with `cinder-scheduler-0` going into `CrashLoopBackOff` state.

      After checking the cinder-scheduler-0 pod log, we found following error.

      025-05-14 05:25:13.864 1 ERROR oslo.messaging._drivers.impl_rabbit [None req-415dae18-0836-4781-8d8d-4dd1a8595d7f - - - - - -] Connection failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010) (retrying in 1.0 seconds): ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010)

      Then we take a look at rabbitmq-server pod log, we found following error:

      [38;5;87m2025-05-14 01:32:57.881294-04:00 [notice] <0.9124.0> TLS server: In state wait_finished at tls_record_1_3.erl:213 generated SERVER ALERT: Fatal - Bad Record MAC
      2025-05-14 01:32:57.881294-04:00 [notice] <0.9124.0>  - {record_type_mismatch,21}

      It uses `quay.rdoproject.org/podified-master-centos10/openstack-rabbitmq:0e75cb30c06f5bce6a42ee75c7be5c50` container which installs `rabbitmq-server      x86_64 3.13.7` and `{color:#c01343}erlang-asn1 x86_64 26.2.5-1.el10{color}` from dlrn master deps repo in tcib job.

      Based on discussion with Luca Miccini on slack thread

      It may happened due to bogus certificate or maybe centos10 enforces some rsa/dsa stuff and that doesn't play well with rabbit. It requires a reproducer to reproduce it.

      Note: If we deploy the controlplane with tls disabled. the controlplane deployment succeeded.

      Below is the reproducer for the same on CentOS Stream 9 install_yamls dev box

      git clone https://github.com/openstack-k8s-operators/install_yamls.git
      cd install_yamls/devsetup
      make download_tools
      cd install_yamls/devsetup
      CPUS=12 MEMORY=25600 DISK=100 make crc
      eval $(crc oc-env)
      oc login -u kubeadmin -p 12345678 https://api.crc.testing:6443
      make crc_attach_default_interface
      cd ..
      make crc_storage
      make input
      make openstack
      make openstack_init
      # Add quay.rdoproject.org to insecure registry
      oc patch --type=merge --patch='{"spec": {"registrySources": {"insecureRegistries": ["quay.rdoproject.org"]}}}' image.config.openshift.io/cluster
      oc patch --type=merge --patch='{"spec": {"registrySources": {"allowedRegistries": ["quay.rdoproject.org","quay.io","gcr.io","registry.redhat.io","image-registry.openshift-image-registry.svc:5000"]}}}' image.config.openshift.io/cluster
      # Download the attached update_containers.yml file to use cs10 master containers
      oc apply -f update_containers.yml
      make openstack_deploy
      # wait for 20 mins tills cinder-scheduler-0 and rabbitmq-server pods are running.
      # check the logs of both pods, you can find the relevant error message.

              rhn-support-mschuppe Martin Schuppert
              rhn-engineering-chkumar Chandan Kumar
              rhos-conplat-core-operators
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: