Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3289

[IBMCloud] Worker machines unreachable during initial bring up

XMLWordPrintable

    • Important
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-1327. The following is the description of the original issue:

      See this comment for some updated information

      Description of problem:
      During IPI installation on IBM Cloud (x86_64), some of the worker machines have been seen to have no network connectivity during their initial bootup. Investigations were performed with IBM Cloud VPC to attempt to identify the issue, but in all appearances, all virtualization appears to be working.

      Unfortunately due to this issue, no network traffic, no access to these worker machines is available to help identify the issue (Ignition is stuck without network traffic), so no SSH or console login is available to collect logs, or perform any testing on these machines.

      The only content available is the console output, showing ignition is stuck due to the network issue.

      Version-Release number of selected component (if applicable):
      4.12.0

      How reproducible:
      About 60%

      Steps to Reproduce:
      1. Create an IPI cluster on IBM Cloud
      2. Wait for the worker machines to be provisioned, causing IPI to fail waiting on machine-api operator
      3. Check console of worker machines failing to report in to cluster (in this case 2 of 3 failed)

      Actual results:
      IPI creation failed waiting on machine-api operator to complete all worker node deployment

      Expected results:
      Successful IPI creation on IBM Cloud

      Additional info:
      As stated, investigation was performed by IBM Cloud VPC, but no further investigation could be performed since no access to these worker machines is available. Any further details that could be provided to help identify the issue would be helpful.

      This appears to have become more prominent recently as well, causing concern for IBM Cloud's IPI GA support on the 4.12 release.

      The only solution to restore network connectivity is rebooting the machine, which loses ignition bring up (I assume it must be triggered manually now), and in the case of IPI, isn't a great mitigation.

            jeffbnowicki Jeff Nowicki
            openshift-crt-jira-prow OpenShift Prow Bot
            May Xu May Xu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: