Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-6672

Live migration fails with TLS cert error when StorageMgmt network is defined

XMLWordPrintable

    • 2
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None
    • Moderate

      When the "StorageMgmt" network is defined in the NodeSet the EDPM node gets deployed in a way that canonical_hostname points to the ctlplane while the dns search order has the storage network in the first place and therefore hostname -f returns the storage FQDN. This leads to cert validation errors during live migration.

      Apr 30 08:31:03 compute-2 virtqemud[106408]: QEMU_MONITOR_RECV_REPLY: mon=0x7fdd50008180 reply={"return": {"status": "failed", "error-desc": "Certificate does not match the hostname compute-1.storagemgmt.example.com"}, "id": "libvirt-438"}
      Apr 30 08:31:03 compute-2 virtqemud[106408]: operation failed: job 'migration out' failed: Certificate does not match the hostname compute-1.storagemgmt.example.com
      Apr 30 08:31:04 compute-2 virtqemud[106408]: internal error: QEMU unexpectedly closed the monitor (vm='instance-00000091'): 2024-04-30T08:31:03.714360Z qemu-kvm: Cannot read from TLS channel: Input/output error
                                                   2024-04-30T08:31:03.714746Z qemu-kvm: Cannot read from TLS channel: Input/output error
                                                   2024-04-30T08:31:03.714863Z qemu-kvm: Cannot read from TLS channel: Input/output error
                                                   2024-04-30T08:31:03.715030Z qemu-kvm: Not a migration stream
                                                   2024-04-30T08:31:03.715205Z qemu-kvm: load of migration failed: Invalid argument
      
      [root@compute-2 ~]# cat /etc/resolv.conf
      # Generated by NetworkManager
      search storagemgmt.example.com ctlplane.example.com internalapi.example.com storage.example.com tenant.example.com ocp.openstack.lab
      nameserver 192.168.122.80
      [root@compute-2 ~]# hostname -f
      compute-2.storagemgmt.example.com
      
      [zuul@controller-0 ~]$ oc get secret dataplanenodeset-openstack-edpm -o json|jq -r '.data["inventory"]'|base64 -d|grep can
                  canonical_hostname: compute-0.ctlplane.example.com
                  canonical_hostname: compute-1.ctlplane.example.com
                  canonical_hostname: compute-2.ctlplane.example.com
      
      [root@compute-2 ~]# cat /etc/os-net-config/config.yaml |grep domain
          domain: ['storagemgmt.example.com', 'ctlplane.example.com', 'internalapi.example.com', 'storage.example.com', 'tenant.example.com']
      
      [zuul@controller-0 architecture]$ oc get secret dataplanenodeset-openstack-edpm -o json|jq -r '.data["inventory"]'|base64 -d|grep dns_search_domains: -A5
                  dns_search_domains:
                      - storagemgmt.example.com
                      - ctlplane.example.com
                      - internalapi.example.com
                      - storage.example.com
                      - tenant.example.com
      --
                  dns_search_domains:
                      - storagemgmt.example.com
                      - ctlplane.example.com
                      - internalapi.example.com
                      - storage.example.com
                      - tenant.example.com
      --
                  dns_search_domains:
                      - storagemgmt.example.com
                      - ctlplane.example.com
                      - internalapi.example.com
                      - storage.example.com
                      - tenant.example.com
      
      [zuul@controller-0 architecture]$ oc get ipset  compute-2 -o json|jq .status.reservations
      [
        {
          "address": "172.20.0.103",
          "cidr": "172.20.0.0/24",
          "dnsDomain": "storagemgmt.example.com",
          "mtu": 1500,
          "network": "StorageMgmt",
          "subnet": "subnet1",
          "vlan": 23
        },
        {
          "address": "192.168.122.102",
          "cidr": "192.168.122.0/24",
          "dnsDomain": "ctlplane.example.com",
          "gateway": "192.168.122.1",
          "mtu": 1500,
          "network": "ctlplane",
          "routes": [
            {
              "destination": "0.0.0.0/0",
              "nexthop": "192.168.122.1"
            }
          ],
          "subnet": "subnet1"
        },
        {
          "address": "172.17.0.103",
          "cidr": "172.17.0.0/24",
          "dnsDomain": "internalapi.example.com",
          "mtu": 1496,
          "network": "internalapi",
          "subnet": "subnet1",
          "vlan": 20
        },
        {
          "address": "172.18.0.103",
          "cidr": "172.18.0.0/24",
          "dnsDomain": "storage.example.com",
          "mtu": 1496,
          "network": "storage",
          "subnet": "subnet1",
          "vlan": 21
        },
        {
          "address": "172.19.0.103",
          "cidr": "172.19.0.0/24",
          "dnsDomain": "tenant.example.com",
          "mtu": 1496,
          "network": "tenant",
          "subnet": "subnet1",
          "vlan": 22
        }
      ]
      

      This is caused by infra-operator ordering the network names alphabetically and in go StorageMgmt is smaller than ctlplane.
      https://github.com/openstack-k8s-operators/infra-operator/blame/main/controllers/network/ipset_controller.go#L278-L281

      As a WA only lower case network names should be used in the NodeSet and the name of the network names after golang shorting should result in ctlplane being the first.

      The real solution could be to drop the lexicographical ordering of ip reservations, the infra-operator should keep the reservation order as is without reordering. The dataplane-operator can implement a validation webhook that ensures that the first network in the NodeSet is always ctlplane.

      See the slack discussion as well https://redhat-internal.slack.com/archives/CQXJFGMK6/p1714474966545229

              smooney@redhat.com Sean Mooney
              rh-ee-bgibizer Balazs Gibizer
              rhos-dfg-compute
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: