Uploaded image for project: 'Multiple Architecture Enablement'
  1. Multiple Architecture Enablement
  2. MULTIARCH-2039

OCP 4.10 nightly build will fail to install if multiple NICs are defined on KVM nodes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • 4.10.0
    • 4.10
    • Multi-Arch
    • None
    • False
    • False
    • NEW
    • NEW

    Description

      Description of problem:

      Version-Release number of selected component (if applicable):
      Performing a OCP 4.10 on Z Cluster installation fails for OCP 4.10.0-0.nightly-s390x-2022-01-14-030142 with RHCOS build 410.84.202201132002-0 will fail when installing with two NICs defined on the control plane nodes. The master nodes will start and have network connectivity. However, the status for the master nodes will show NotReady, and its kubelet.service log will show the following error:

      Jan 14 15:30:33 master-00.pok-106.ocptest.pok.stglabs.ibm.com hyperkube[2540]: E0114 15:30:33.585209 2540 pod_workers.go:918] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?" pod="openshift-multus/network-metrics-daemon-lq6jp" podUID=0ad24722-75d6-49f5-8ac1-e90f40095a7d

      Since the control plane nodes do not completely come online, the worker nodes will fail to boot and install RHCOS.

      I have verified that this installation failure occurs with following 4.10 nightly builds:

      4.10.0-0.nightly-s390x-2022-01-13-022003 & RHCOS 410.84.202201121602-0
      4.10.0-0.nightly-s390x-2022-01-05-011736 & RHCOS 410.84.202201041402-0
      4.10.0-0.nightly-s390x-2022-01-14-030142 & RHCOS 410.84.202201132002-0

      I have performed the same installation successfully using the following 4.9 and older 4.10 nightly builds:

      4.9.15 & RHCOS 4.9.0
      4.9.14 & RHCOS 4.9.0
      4.10.0-0.nightly-s390x-2021-12-09-171055 & RHCOS 410.84.202112062233-0
      4.10.0-0.nightly-s390x-2021-12-10-233457 & RHCOS 410.84.202112091602-0

      ALL the above OCP installations use a networkType of OVNKubernetes under the install-config.yaml.

      However, if I were to change the networkType to use openshiftSDN. OCP installation will succeed with all OCP 4.10 nightly builds.

      Version-Release number of selected component (if applicable):
      1. OCP 4.10 nightly build 4.10.0-0.nightly-s390x-2022-01-14-030142
      2. RHCOS build 410.84.202201132002-0

      How reproducible:
      Consistently reproducible.

      Steps to Reproduce:
      1. Attempt to install OCP 4.10 nightly build 4.10.0-0.nightly-s390x-2022-01-14-030142 with RHCOS 410.84.202201132002-0.
      2. Start bootstrap, master(control planes), and worker(compute) nodes with multiple network interfaces defined.
      3. For example, pass two --network parameters and two IP addresses for —extra-args within the vert-install command.

      Actual results:
      Bootstrap and master (control plane) nodes will boot. Master nodes will show a status of NotReady and worker (compute) nodes will fail to boot and install RHCOS.

      Expected results:
      All of the bootstrap, master (control plane), and worker (compute) nodes should all successfully install the RHCOS build successfully and become Ready.

      Additional info:
      I have attached the logs from bootstrap.service (bootstrap-0) and kubelet.service(master-0) for a failed installation that uses the OVNKubernetes neworkType.

      Attachments

        Issue Links

          Activity

            People

              rhn-engineering-dgilmore Dennis Gilmore (Inactive)
              chanphil Philip Chan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: