Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
- trt-incident

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

test=node count should match or exceed machine count

This test is degraded since Wednesday.

The failures look like:

 	0s
{  
          Timed out waiting for node count (5) to equal or exceed machine count (6).
          NAMESPACE               NAME                          PHASE         TYPE   REGION   ZONE   AGE
openshift-machine-api   ostest-8m586-master-0         Running                              66m
openshift-machine-api   ostest-8m586-master-1         Running                              66m
openshift-machine-api   ostest-8m586-master-2         Running                              66m
openshift-machine-api   ostest-8m586-worker-0-6j8v5   Running                              53m
openshift-machine-api   ostest-8m586-worker-0-ctkwz   Running                              53m
openshift-machine-api   ostest-8m586-worker-0-hq2pw   Provisioned                          53m
          NAME                                 STATUS   ROLES                  AGE   VERSION
master-0.ostest.test.metalkube.org   Ready    control-plane,master   56m   v1.29.1+2f773e8
master-1.ostest.test.metalkube.org   Ready    control-plane,master   57m   v1.29.1+2f773e8
master-2.ostest.test.metalkube.org   Ready    control-plane,master   56m   v1.29.1+2f773e8
worker-0.ostest.test.metalkube.org   Ready    worker                 31m   v1.29.1+2f773e8
worker-2.ostest.test.metalkube.org   Ready    worker                 31m   v1.29.1+2f773e8
        }

Checking 6 examples, it does appear it's always just one Machine stuck in Provisioned, never more than that, which makes capacity issues feel less likely.

This is intermittently failing payloads.

Sippy link above indicates it's hitting two ipv6 jobs:

periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6
periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-serial-ovn-ipv6

Sample job run we're looking at: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6/1758401103050838016

worker-2 is the affected system in this case but sadly due to whatever is wrong, we get no systemd logs.

links to

openshift/machine-config-operator#4198: [WIP] TRT-1519: Revert #4146 "OCPBUGS-27162: Add dependency on crio-wipe to resolv-prepender"

openshift/machine-config-operator#4202: [WIP] TRT-1519: Revert #4129 "OCPBUGS-18940: Add keepalived healthcheck for machine-config-server"

openshift/machine-config-operator#4203: [WIP] TRT-1519: Revert #4133 "OPNET-355: Use NM's dns-change event for resolv.conf"

openshift/router#563: NE-1444: Resolve missing socat binary