Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-64765

TNF assisted-service installation stuck due to address/port conflict

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.20
    • Two Node Fencing
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 0
    • Critical
    • None
    • None
    • None
    • None
    • OCPEDGE Sprint 280, OCPEDGE Sprint 281
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Version-Release number of selected component (if applicable):

          The installation of a TNF cluster using agent-based installation gets stuck due to failure in pacemaker to synchronize entities.
      The journalctl shows failures related to missing revision.json file.
      
      

      How reproducible:

          

      Steps to Reproduce:

      1.Install cluster acting as Hub cluster
      2.Install MCE 
      3.Provision the infrastruture for a new spoke cluster
      4.Apply the manifests that deploy a TNF cluster javier-1_manifests.tgz 5.After the nodes are installed, the ACI status is "finalizing", but the status of the pacemaker running on the hosts is showing 
      
      

      Actual results:

          pcs status:
      
      Full List of Resources:
        * Clone Set: kubelet-clone [kubelet]:
          * Started: [ javier-master-1-0 javier-master-1-1 ]
        * javier-master-1-0_redfish    (stonith:fence_redfish):     Started javier-master-1-0
        * javier-master-1-1_redfish    (stonith:fence_redfish):     Started javier-master-1-1
        * Clone Set: etcd-clone [etcd]:
          * Stopped: [ javier-master-1-0 javier-master-1-1 ]Failed Resource Actions:
        * etcd start on javier-master-1-1 returned 'error' (podman failed to launch container (error code: 1)) at Thu Nov  6 15:48:25 2025 after 2m6.080s
        * etcd start on javier-master-1-0 could not be executed (Timed Out: Resource agent did not complete within 10m) at Thu Nov  6 15:48:25 2025 after 10m2ms
      

      Expected results:

          pcs status withour failed resources and TNF cluster deployed successfully

      Additional info:

      Manual workaround can be used "sudo pcs resource cleanup" to continue the installation.

       

       

       

              rh-ee-pfontani Pablo Fontanilla
              frmoreno Francisco Javier Moreno
              None
              None
              Douglas Hensel Douglas Hensel
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: