-
Story
-
Resolution: Done
-
Blocker
-
None
-
None
-
False
-
None
-
False
-
-
test=node count should match or exceed machine count
This test is degraded since Wednesday.
The failures look like:
0s
{
Timed out waiting for node count (5) to equal or exceed machine count (6).
NAMESPACE NAME PHASE TYPE REGION ZONE AGE
openshift-machine-api ostest-8m586-master-0 Running 66m
openshift-machine-api ostest-8m586-master-1 Running 66m
openshift-machine-api ostest-8m586-master-2 Running 66m
openshift-machine-api ostest-8m586-worker-0-6j8v5 Running 53m
openshift-machine-api ostest-8m586-worker-0-ctkwz Running 53m
openshift-machine-api ostest-8m586-worker-0-hq2pw Provisioned 53m
NAME STATUS ROLES AGE VERSION
master-0.ostest.test.metalkube.org Ready control-plane,master 56m v1.29.1+2f773e8
master-1.ostest.test.metalkube.org Ready control-plane,master 57m v1.29.1+2f773e8
master-2.ostest.test.metalkube.org Ready control-plane,master 56m v1.29.1+2f773e8
worker-0.ostest.test.metalkube.org Ready worker 31m v1.29.1+2f773e8
worker-2.ostest.test.metalkube.org Ready worker 31m v1.29.1+2f773e8
}
Checking 6 examples, it does appear it's always just one Machine stuck in Provisioned, never more than that, which makes capacity issues feel less likely.
This is intermittently failing payloads.
Sippy link above indicates it's hitting two ipv6 jobs:
periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6
periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-serial-ovn-ipv6
Sample job run we're looking at: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6/1758401103050838016
worker-2 is the affected system in this case but sadly due to whatever is wrong, we get no systemd logs.
- links to