-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.14.0, 4.15.0
-
Moderate
-
No
-
3
-
Rejected
-
False
-
Description of problem
Rehearsing hive e2e tests against 4.14 nightly has been failing consistently. The failing section is testing hive MachinePools, which generate and scale MachineSets on the spoke (target cluster). The failure happens at any of various points in this test where we're scaling up: one or more Machines hang in the Provisioned state; and the test times out after 15m waiting for the corresponding Node(s) to appear and become healthy.
I reproduced this locally and looked at the instances in the AWS console. They show 1/2 status checks failing. The bad one says "Instance reachability check failed".
I'm attaching serial console logs from a bad instance as well as a good one. (These are my first ever: I don't know how to read them, or even if I captured them correctly. Please let me know if you need something else/again/different.)
Version-Release number of selected component (if applicable)
4.14 nightlies (candidate stream) for at least a couple months.
How reproducible:
Very. I won't say 100%, but it's close.
Steps to Reproduce
Via hive:
1. Provision a spoke on AWS using a 4.14 nightly release image
2. Set CLUSTER_NAME and CLUSTER_NAMESPACE env vars
3. Run go test ./test/e2e/postinstall/machinesets/...
Test will (usually) fail, complaining of timeout waiting for nodes.
Without hive (speculative):
1. Install a 4.14 on AWS
2. Scale the default worker pool down to 1 replica.
3. Scale it back up to 3 replicas
4. Watch machines/nodes. One or more will get stuck.
Actual results
Nodes don't become healthy.
Expected results
Nodes become healthy
Additional info
I have an environment set up where I can reproduce this, usually within tens of minutes. Let me know if you want access.
- is blocked by
-
OCPBUGS-20356 [4.15] Bootimage bump tracker
- Closed
- is cloned by
-
OCPBUGS-17154 4.14/AWS: Machines using m4 instance types don't get network
- Closed
- is depended on by
-
OCPBUGS-17154 4.14/AWS: Machines using m4 instance types don't get network
- Closed
- links to