Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-1519

Metal IPI IPv6 Job Intermittently Failing to Get One Node/Machine

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Blocker Blocker
    • None
    • None
    • False
    • None
    • False

      test=node count should match or exceed machine count

      This test is degraded since Wednesday.

      The failures look like:

       	0s
      {  
                Timed out waiting for node count (5) to equal or exceed machine count (6).
                NAMESPACE               NAME                          PHASE         TYPE   REGION   ZONE   AGE
      openshift-machine-api   ostest-8m586-master-0         Running                              66m
      openshift-machine-api   ostest-8m586-master-1         Running                              66m
      openshift-machine-api   ostest-8m586-master-2         Running                              66m
      openshift-machine-api   ostest-8m586-worker-0-6j8v5   Running                              53m
      openshift-machine-api   ostest-8m586-worker-0-ctkwz   Running                              53m
      openshift-machine-api   ostest-8m586-worker-0-hq2pw   Provisioned                          53m
                NAME                                 STATUS   ROLES                  AGE   VERSION
      master-0.ostest.test.metalkube.org   Ready    control-plane,master   56m   v1.29.1+2f773e8
      master-1.ostest.test.metalkube.org   Ready    control-plane,master   57m   v1.29.1+2f773e8
      master-2.ostest.test.metalkube.org   Ready    control-plane,master   56m   v1.29.1+2f773e8
      worker-0.ostest.test.metalkube.org   Ready    worker                 31m   v1.29.1+2f773e8
      worker-2.ostest.test.metalkube.org   Ready    worker                 31m   v1.29.1+2f773e8
              }
      

      Checking 6 examples, it does appear it's always just one Machine stuck in Provisioned, never more than that, which makes capacity issues feel less likely.

      This is intermittently failing payloads.

      Sippy link above indicates it's hitting two ipv6 jobs:

      periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6
      periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-serial-ovn-ipv6

      Sample job run we're looking at: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ipi-ovn-ipv6/1758401103050838016

      worker-2 is the affected system in this case but sadly due to whatever is wrong, we get no systemd logs.

            rhn-engineering-dgoodwin Devan Goodwin
            rhn-engineering-dgoodwin Devan Goodwin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: