Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15507

When deploying a new OpenShift cluster some nodes have network a network issue

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 4.11.z
    • RHCOS
    • No
    • 3
    • Sprint 239 - OSIntegration, Sprint 240 - OSIntegration
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      My customer is reporting the following issue:
      
      We have made a pipeline to deploy an OpenShift Cluster which was working fine until a fews weeks ago. 
      
      When we deploy a new OpenShift cluster we see that 30% of the nodes (worker, storage and infra) are in a endless loop: "Get error: Get 'https://api-int.xxx.x.xxx.xx:22623/config/worker' : dial tcp: lookup api-int.xxx.xxxx.xxxx.xx on [::1]:53: read udp [::1]:35946->[::1]:53: read: connection refused"
      This message will scroll endlessly over the console. See attachment of console screenshot.
      
      When we give a "ctrl-alt-del" in the console of the hanging node, the node will boot and continue successfully. The installation will complete okay for that node.

      Version-Release number of selected component (if applicable):

      4.11.26

      How reproducible:

      Difficult and random

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Some nodes get stuck at first boot

      Expected results:

      All nodes to be deployed without issue

      Additional info:

       

            rhn-gps-dmabe Dusty Mabe
            rhn-support-andbartl Andy Bartlett
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: