Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57368

On gcp after scaling up 400 worker nodes the cluster is unreachable

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Scaling up 400 worker nodes on gcp, after all machines are running the cluster will be unreachable, master n2-standard-4, worker n2-standard-2

      Version-Release number of selected component (if applicable):

      Always

      How reproducible:

      always

      Steps to Reproduce:

      1. Set up a cluster on gcp master n2-standard-4, worker n2-standard-2 
      2. Scale up machineset to 400
      3. After all machines are running, run oc commands.
          

      Actual results:

      oc commands always get "Unable to connect to the server: EOF"
      
      $ oc scale machineset zhsungcp-t9nql-worker-b --replicas 200     [15:39:41] 
      oc scale machineset zhsungcp-t9nql-worker-c --replicas 100
      oc scale machineset zhsungcp-t9nql-worker-d --replicas 100
      machineset.machine.openshift.io/zhsungcp-t9nql-worker-b scaled
      machineset.machine.openshift.io/zhsungcp-t9nql-worker-c scaled
      machineset.machine.openshift.io/zhsungcp-t9nql-worker-d scaled
      
      $ oc get machine | grep Provisioning | wc                [15:59:36]
      oc get machine | grep Provisioned | wc -l
      oc get machine | grep Running | wc -l
            18
            59
           332
      
      $ oc get machine | grep Provisioning | wc -l             [16:06:43]
      oc get machine | grep Provisioned | wc -l
      oc get machine | grep Running | wc -l
             0
             0
           403
      $ oc get csr | grep Pending | wc -l                   [16:08:14]
             0
      
      mapi_machine_phase_transition_seconds_sum{phase="Running"} 393510.21958385484
      mapi_machine_phase_transition_seconds_count{phase="Running"} 403
      
       $ oc get co                              [16:32:18]
      Unable to connect to the server: EOF

      Expected results:

      The cluster is reachable

      Additional info:

          

              rh-ee-tbarberb Theo Barber-Bany
              rhn-support-zhsun Zhaohua Sun
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: