Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1139

The BMH gets stuck in state "available" when we try to deploy a spoke cluster with ALLOW_CONVERGED_FLOW=True in assisted-service

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None

    Description

      Version:
      4.12.0-0.nightly-2022-09-08-114806
      multicluster-engine.v2.2.0

      Tried to deploy a spoke SNO from hub SNO.

      The BMH was in "preparing" status and then switched to "available" and got stuck in this state.
      The ironic-agent log on the spoke shows this repeating error:

      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent [-] error sending heartbeat to https://10.19.134.13:6385: ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 404: Node 48d24898-1911-4f43-82b0-0b15f8484ae7 could not be found.
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent Traceback (most recent call last):
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 122, in do_heartbeat
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent     generated_cert=self.agent.generated_cert,
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py", line 175, in heartbeat
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent     raise errors.HeartbeatError(error)
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 404: Node 48d24898-1911-4f43-82b0-0b15f8484ae7 could not be found.
      2022-09-10 14:07:55.529 1 ERROR ironic_python_agent.agent 
      2022-09-10 14:07:55.529 1 INFO ironic_python_agent.agent [-] sleeping before next heartbeat, interval: 153.94185373943864
      2022-09-10 14:08:00.530 1 DEBUG ironic_python_agent.ironic_api_client [-] Heartbeat: announcing callback URL https://10.19.134.5:9999, API version is 1.68 heartbeat /usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py:162
      /usr/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
        InsecureRequestWarning)
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent [-] error sending heartbeat to https://10.19.134.13:6385: ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 404: Node 48d24898-1911-4f43-82b0-0b15f8484ae7 could not be found.
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent Traceback (most recent call last):
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 122, in do_heartbeat
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent     generated_cert=self.agent.generated_cert,
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py", line 175, in heartbeat
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent     raise errors.HeartbeatError(error)
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 404: Node 48d24898-1911-4f43-82b0-0b15f8484ae7 could not be found.
      2022-09-10 14:08:00.544 1 ERROR ironic_python_agent.agent 
      2022-09-10 14:08:00.544 1 INFO ironic_python_agent.agent [-] sleeping before next heartbeat, interval: 129.89219707896012
      2022-09-10 14:08:05.545 1 DEBUG ironic_python_agent.ironic_api_client [-] Heartbeat: announcing callback URL https://10.19.134.5:9999, API version is 1.68 heartbeat /usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py:162
      /usr/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
        InsecureRequestWarning)
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent [-] error sending heartbeat to https://10.19.134.13:6385: ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 404: Node 48d24898-1911-4f43-82b0-0b15f8484ae7 could not be found.
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent Traceback (most recent call last):
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/agent.py", line 122, in do_heartbeat
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent     generated_cert=self.agent.generated_cert,
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent   File "/usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py", line 175, in heartbeat
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent     raise errors.HeartbeatError(error)
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent ironic_python_agent.errors.HeartbeatError: Error heartbeating to agent API: Error 404: Node 48d24898-1911-4f43-82b0-0b15f8484ae7 could not be found.
      2022-09-10 14:08:05.559 1 ERROR ironic_python_agent.agent 
      2022-09-10 14:08:05.560 1 INFO ironic_python_agent.agent [-] sleeping before next heartbeat, interval: 132.4694016992394
      2022-09-10 14:08:10.560 1 DEBUG ironic_python_agent.ironic_api_client [-] Heartbeat: announcing callback URL https://10.19.134.5:9999, API version is 1.68 heartbeat /usr/lib/python3.6/site-packages/ironic_python_agent/ironic_api_client.py:162
      /usr/lib/python3.6/site-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
        InsecureRequestWarning)
      

      Collecting the BMH state from the hub:

      
      oc get bmh master-1-0 -o yaml
      apiVersion: metal3.io/v1alpha1
      kind: BareMetalHost
      metadata:
        annotations:
          baremetalhost.metal3.io/detached: assisted-service-controller
          bmac.agent-install.openshift.io/hostname: master-1-0
          bmac.agent-install.openshift.io/role: master
        creationTimestamp: "2022-09-10T00:01:18Z"
        finalizers:
        - baremetalhost.metal3.io
        generation: 1
        labels:
          infraenvs.agent-install.openshift.io: qe2
        name: master-1-0
        namespace: qe2
        resourceVersion: "102761"
        uid: ec96109c-a05a-40b9-90ee-1756998d020e
      spec:
        automatedCleaningMode: disabled
        bmc:
          address: idrac-virtualmedia://10.19.133.17/redfish/v1/Systems/System.Embedded.1
          credentialsName: bmc-secret1
          disableCertificateVerification: true
        bootMACAddress: 98:03:9b:61:7c:61
        online: true
        rootDeviceHints:
          deviceName: /dev/sda
      status:
        errorCount: 0
        errorMessage: ""
        goodCredentials:
          credentials:
            name: bmc-secret1
            namespace: qe2
          credentialsVersion: "89066"
        hardware:
          cpu:
            arch: x86_64
            clockMegahertz: 3700
            count: 64
            flags:
            - 3dnowprefetch
            - abm
            - acpi
            - adx
            - aes
            - aperfmperf
            - apic
            - arat
            - arch_capabilities
            - arch_perfmon
            - art
            - avx
            - avx2
            - avx512bw
            - avx512cd
            - avx512dq
            - avx512f
            - avx512vl
            - bmi1
            - bmi2
            - bts
            - cat_l3
            - cdp_l3
            - clflush
            - clflushopt
            - clwb
            - cmov
            - constant_tsc
            - cpuid
            - cpuid_fault
            - cqm
            - cqm_llc
            - cqm_mbm_local
            - cqm_mbm_total
            - cqm_occup_llc
            - cx16
            - cx8
            - dca
            - de
            - ds_cpl
            - dtes64
            - dtherm
            - dts
            - epb
            - ept
            - ept_ad
            - erms
            - est
            - f16c
            - flexpriority
            - flush_l1d
            - fma
            - fpu
            - fsgsbase
            - fxsr
            - hle
            - ht
            - ibpb
            - ibrs
            - ida
            - intel_ppin
            - intel_pt
            - invpcid
            - invpcid_single
            - lahf_lm
            - lm
            - mba
            - mca
            - mce
            - md_clear
            - mmx
            - monitor
            - movbe
            - mpx
            - msr
            - mtrr
            - nonstop_tsc
            - nopl
            - nx
            - ospke
            - pae
            - pat
            - pbe
            - pcid
            - pclmulqdq
            - pdcm
            - pdpe1gb
            - pebs
            - pge
            - pku
            - pln
            - pni
            - popcnt
            - pse
            - pse36
            - pti
            - pts
            - rdrand
            - rdseed
            - rdt_a
            - rdtscp
            - rep_good
            - rtm
            - sdbg
            - sep
            - smap
            - smep
            - smx
            - ss
            - ssbd
            - sse
            - sse2
            - sse4_1
            - sse4_2
            - ssse3
            - stibp
            - syscall
            - tm
            - tm2
            - tpr_shadow
            - tsc
            - tsc_adjust
            - tsc_deadline_timer
            - vme
            - vmx
            - vnmi
            - vpid
            - x2apic
            - xgetbv1
            - xsave
            - xsavec
            - xsaveopt
            - xsaves
            - xtopology
            - xtpr
            model: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
          firmware:
            bios:
              date: 12/17/2018
              vendor: Dell Inc.
              version: 1.6.13
          hostname: openshift-worker-0.qe1.kni.lab.eng.bos.redhat.com
          nics:
          - ip: 10.19.134.5
            mac: 98:03:9b:61:7c:61
            model: 0x15b3 0x1015
            name: eno2
          - ip: fe80::9a03:9bff:fe61:7c60%eno1.3100
            mac: 98:03:9b:61:7c:60
            name: eno1.3100
          - ip: fe80::9a03:9bff:fe61:7c60%eno1.2100
            mac: 98:03:9b:61:7c:60
            name: eno1.2100
          - ip: fe80::9a03:9bff:fe61:7c61%eno2.200
            mac: 98:03:9b:61:7c:61
            name: eno2.200
          - mac: 98:03:9b:61:7c:60
            model: 0x15b3 0x1015
            name: eno1
          - ip: fe80::9a03:9bff:fe61:7c60%eno1.1100
            mac: 98:03:9b:61:7c:60
            name: eno1.1100
          ramMebibytes: 196608
          storage:
          - hctl: "1:2:0:0"
            model: PERC H330 Mini
            name: /dev/sdb
            rotational: true
            sizeBytes: 479559942144
            type: HDD
            vendor: DELL
          - model: Dell Express Flash NVMe P4610 1.6TB SFF
            name: /dev/nvme0n1
            sizeBytes: 1600000000000
            type: NVME
          - model: Dell Express Flash NVMe P4610 1.6TB SFF
            name: /dev/nvme1n1
            sizeBytes: 1600000000000
            type: NVME
          - model: Dell Express Flash NVMe P4610 1.6TB SFF
            name: /dev/nvme2n1
            sizeBytes: 1600000000000
            type: NVME
          - model: Dell Express Flash NVMe P4610 1.6TB SFF
            name: /dev/nvme3n1
            sizeBytes: 1600000000000
            type: NVME
          systemVendor:
            manufacturer: Dell Inc.
            productName: PowerEdge R640 (SKU=NotProvided;ModelName=PowerEdge R640)
            serialNumber: 176S2W2
        hardwareProfile: unknown
        lastUpdated: "2022-09-10T00:18:32Z"
        operationHistory:
          deprovision:
            end: null
            start: null
          inspect:
            end: "2022-09-10T00:08:45Z"
            start: "2022-09-10T00:01:41Z"
          provision:
            end: null
            start: null
          register:
            end: "2022-09-10T00:01:41Z"
            start: "2022-09-10T00:01:18Z"
        operationalStatus: detached
        poweredOn: false
        provisioning:
          ID: 48d24898-1911-4f43-82b0-0b15f8484ae7
          bootMode: UEFI
          image:
            url: ""
          raid:
            hardwareRAIDVolumes: null
            softwareRAIDVolumes: []
          rootDeviceHints:
            deviceName: /dev/sda
          state: available
        triedCredentials:
          credentials:
            name: bmc-secret1
            namespace: qe2
          credentialsVersion: "89066"
      

      Attachments

        Issue Links

          Activity

            People

              rhn-engineering-dtantsur Dmitry Tantsur
              achuzhoy@redhat.com Alexander Chuzhoy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: