Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-30232

IBI sno deployment faild due to Managedcluster stuck in "unknown" state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • ACM 2.16.0
    • SiteConfig Operator
    • None
    • False
    • False
    • Critical
    • None

      Description of problem:

      The deployment of IBI Cluster 4.22 ACM 2.16 fails ,
      when the ztp-done label on the helix82 ManagedCluster reaches timeout due to to managedcluster stay stuck in  unkonwn state ,

      deployment log:

      TASK [deploy-gitops-du : Wait for helix82 ManagedCluster ztp-done label] *******
      FAILED - RETRYING: ... (45 retries)
      fatal: [registry.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com]: FAILED! =>
      {"attempts": 45, "cmd": "oc get ManagedCluster helix82 -o json | jq -r '.metadata.labels[\"ztp-done\"]'\n",
       "stdout": "null", "stdout_lines": ["null"], "rc": 0, ...} 

      from the hub :

      [kni@registry.kni-qe-81 temp-2]$ oc get no 
      NAME                                                          STATUS   ROLES                         AGE     VERSION
      openshift-master-0.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com   Ready    control-plane,master,worker   5d21h   v1.34.2
      openshift-master-1.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com   Ready    control-plane,master,worker   5d21h   v1.34.2
      openshift-master-2.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com   Ready    control-plane,master,worker   5d21h   v1.34.2
      [kni@registry.kni-qe-81 temp-2]$ oc get bmh -A 
      NAMESPACE               NAME                                     STATE                    CONSUMER                   ONLINE   ERROR   AGE
      helix82                 helix82.telcoqe.eng.rdu2.dc.redhat.com   externally provisioned                              true             16h
      openshift-machine-api   openshift-master-0                       provisioned              kni-qe-81-hvkch-master-0   true             5d21h
      openshift-machine-api   openshift-master-1                       provisioned              kni-qe-81-hvkch-master-1   true             5d21h
      openshift-machine-api   openshift-master-2                       provisioned              kni-qe-81-hvkch-master-2   true             5d21h
      (failed reverse-i-search)`maange': ls must-gather.local.*/cluster-scoped-resources/operator.open-cluster-^Cnagement.io/
      [kni@registry.kni-qe-81 temp-2]$ oc get managedcluster -A 
      NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                                        JOINED   AVAILABLE   AGE
      helix82         true                                                                                Unknown     16h
      local-cluster   true           https://api.kni-qe-81.telcoqe.eng.rdu2.dc.redhat.com:6443   True     True        5d20h
      [kni@registry.kni-qe-81 temp-2]$ oc get clusterdeployment -A 
      NAMESPACE   NAME      INFRAID         PLATFORM        REGION   VERSION                              CLUSTERTYPE   PROVISIONSTATUS   POWERSTATE   AGE
      helix82     helix82   helix82-ncvrv   none-platform            4.22.0-0.nightly-2026-02-08-124411                 Provisioned       Running      16h
      

      from the spoke journalctl, 
      Spoke first-boot failed during network apply :

      Feb 17 19:36:53 localhost.localdomain lca-cli[3350]: level=error msg="failed to configure networking, err: failed to apply static network: failed to apply nmstate config ... err: failed to run \"nmstatectl\" in host namespace with args [apply /opt/openshift/nmstate.yaml]: ...
      NmstateError: Bug: DbusConnectionError: org.freedesktop.DBus.Error.ServiceUnknown: The name is not activatable
      : exit status 1" 

      SNO config script failed :

      Feb 17 19:36:53 localhost.localdomain lca-cli[3350]: level=fatal msg="Post pivot operation failed"
      Feb 17 19:36:53 localhost.localdomain systemd[1]: installation-configuration.service: Main process exited, code=exited, status=1/FAILURE
      Feb 17 19:36:53 localhost.localdomain systemd[1]: Failed to start Image base SNO configuration script. 

       

      Version-Release number of selected component (if applicable):

      ACM 2.16.0-237
      OCP 4.22.0-ec.2

      MCE 2.11.0-268

      How reproducible:

      3 times in row

      Steps to Reproduce:

      1. hub deployed with ocp 4.22 and acm 2.16 
      2. start the IBI ci , 
      3. fails during the step of "Deploy SNO using IBI 4.22 with operators from brew"

      Actual results:

      deployment complete 

      Expected results:

      deployment reach timeout while waiting for the ztp-done label . 

      Additional info:

              sakhoury@redhat.com Sharat Akhoury
              rh-ee-bazem Bahaa Azem
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: