Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20198

4.14/AWS: Machines using m4 instance types don't get network

XMLWordPrintable

    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem

      Rehearsing hive e2e tests against 4.14 nightly has been failing consistently. The failing section is testing hive MachinePools, which generate and scale MachineSets on the spoke (target cluster). The failure happens at any of various points in this test where we're scaling up: one or more Machines hang in the Provisioned state; and the test times out after 15m waiting for the corresponding Node(s) to appear and become healthy.

      I reproduced this locally and looked at the instances in the AWS console. They show 1/2 status checks failing. The bad one says "Instance reachability check failed".

      I'm attaching serial console logs from a bad instance as well as a good one. (These are my first ever: I don't know how to read them, or even if I captured them correctly. Please let me know if you need something else/again/different.)

      Version-Release number of selected component (if applicable)

      4.14 nightlies (candidate stream) for at least a couple months.

      How reproducible:

      Very. I won't say 100%, but it's close.

      Steps to Reproduce

      Via hive:
      1. Provision a spoke on AWS using a 4.14 nightly release image
      2. Set CLUSTER_NAME and CLUSTER_NAMESPACE env vars
      3. Run go test ./test/e2e/postinstall/machinesets/...

      Test will (usually) fail, complaining of timeout waiting for nodes.

      Without hive (speculative):
      1. Install a 4.14 on AWS
      2. Scale the default worker pool down to 1 replica.
      3. Scale it back up to 3 replicas
      4. Watch machines/nodes. One or more will get stuck.

      Actual results

      Nodes don't become healthy.

      Expected results

      Nodes become healthy

      Additional info

      I have an environment set up where I can reproduce this, usually within tens of minutes. Let me know if you want access.

            jlebon1@redhat.com Jonathan Lebon
            efried.openshift Eric Fried
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: