Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37463

BM IPI node replacement fails with introspection timeout error

XMLWordPrintable

    • Important
    • None
    • 1
    • Metal Platform 258
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      When replacing master nodes, two of the nodes are stuck at introspection error.
      
      Env:
      BM IPI 3-node cluster
      
      Issue:
      Two of the master nodes failed as disk crashed on these nodes.
      Post replacing the disk, trying to restore the cluster following steps mentioned in the documentation.
      
      [1] https://docs.openshift.com/container-platform/4.12/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html#restore-replace-stopped-baremetal-etcd-member_replacing-unhealthy-etcd-member
          
      
      Error:
      
      $ 1 DEBUG ironic_inspector.node_cache [-] [node: node-id state error] Committing fields: {'finished_at': datetime.datetime(2024, 7, 22, 13, 40, 11, 506412), 'error': 'Introspection timeout'} _commit /usr/lib/python3.9/site-packages/ironic_inspector/node_cache.py:152ESC[00m
      
      
      

      Version-Release number of selected component (if applicable):

      4.12.z    

      How reproducible:

      N/A    

      Steps to Reproduce:

          1. Follow steps to replace the master node
          2. Create BMH resource CR
          3. Apply BMH resource CR
          

      Actual results:

      Introspection failed with timeout    

      Expected results:

        BMH inspect should complete 

      Additional info:

          

              rh-ee-masghar Mahnoor Asghar
              rhn-support-chdeshpa Chinmay Deshpande
              Jad Haj Yahya Jad Haj Yahya
              Chinmay Deshpande
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: