Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-22000

[2137250] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy

XMLWordPrintable

    • Medium
    • None

      +++ This bug was initially created as a clone of Bug #2087172 +++

      Description of problem:
      When having multiple SriovNetworkNodePolicy for the same iface (Which hold different config), NNCP deployment sometimes fails with the following message:
      libnmstate.error.NmstateVerificationError
      Found VF ports count does not match desired 32, current is:
      NNCE cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com.static-ip-cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com: libnmstate.error.NmstateVerificationError"

      To clarify - applied VF ports count is 0, because one SriovNetworkNodePolicy sets desired to 0. The other policy sets desired to 32.
      Both policies attached.

      NNS info about said interface -
      [adi@fedora cnv-tests]$ oc get nns cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com -o yaml
      apiVersion: nmstate.io/v1beta1
      kind: NodeNetworkState
      ...

      • ethernet:
        auto-negotiation: false
        duplex: full
        speed: 10000
        sr-iov:
        total-vfs: 0
        vfs: []
        ipv4:
        address:
      • ip: 10.1.156.17
        prefix-length: 24
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        dhcp: true
        enabled: true
        ipv6:
        address:
      • ip: fe80::e643:4bff:feec:8400
        prefix-length: 64
        auto-dns: true
        auto-gateway: true
        auto-route-table-id: 0
        auto-routes: true
        autoconf: true
        dhcp: true
        enabled: true
        lldp:
        enabled: false
        mac-address: E4:43:4B:EC:84:00
        mtu: 1500
        name: eno1
        state: up
        type: ethernet
        ...

      Version-Release number of selected component (if applicable):
      kubernetes-nmstate-handler v4.10.1-12

      How reproducible:
      On any Openshift cluster with CNV and SRIOV operator.

      Steps to Reproduce:
      1. Deploy attached SriovNetworkNodePolicys (Default my be applied when installing SRIOV-operator, so no need to apply it).
      2. Deploy following NNCP (Adjust values) -
      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
      name: static-ip-cnv-qe-infra-18.cnvqe2.lab.eng.rdu2.redhat.com
      spec:
      desiredState:
      interfaces:

      • ipv4:
        address:
      • ip: 10.1.156.18
        prefix-length: 24
        auto-dns: true
        dhcp: false
        enabled: true
        ipv6:
        address:
      • ip: fe80::e643:4bff:feec:76d0
        prefix-length: 64
        auto-dns: true
        autoconf: false
        dhcp: false
        enabled: true
        name: eno1
        state: up
        type: ethernet
        nodeSelector:
        kubernetes.io/hostname: cnv-qe-infra-18.cnvqe2.lab.eng.rdu2.redhat.com

      Actual results:
      NNCP deployment fails with the following message:
      libnmstate.error.NmstateVerificationError
      Found VF ports count does not match desired 32, current is:
      NNCE cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com.static-ip-cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com: libnmstate.error.NmstateVerificationError"

      Expected results:
      NNCP deployment should be applied succesfully

      Additional info:
      Two points that this bug should address -
      1. Why should nmstate be concerned with SRIOV config, when it's not required to make changes.
      2. Why isn't nmstate able to determine which of the two sriov policies represents actual desired state? If SRIOV operator was able to deploy both policies, nmstate shouldn't bother with this.

      — Additional comment from Quique Llorente on 2022-05-26 10:07:51 UTC —

      @azavalko Can you also add full nmstate logs either from the NNCE digest or from handler pod logs.

      Clearly nmstate should not take into account sriov here, I remember we where having similar issues with vxlan + openshift-sdn, at the end they fixed it by ignoring vxlan if is not part of the configuration, similar solution should fix this.

      — Additional comment from Adi Zavalkovsky on 2022-06-06 14:13:06 UTC —

      — Additional comment from Quique Llorente on 2022-07-04 06:07:48 UTC —

      @azavalko@redhat.com can you test this with nmstate scratch build azavalko@redhat.com ?

      — Additional comment from Quique Llorente on 2022-07-04 06:08:49 UTC —

      I mean this https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46367733

      — Additional comment from Quique Llorente on 2022-07-13 10:00:25 UTC —

      @gris the scrach build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46367733 is missing python3-libnmstate

      — Additional comment from Petr Horáček on 2022-07-14 12:58:56 UTC —

      We have an RPM of nmstate that should fix it. We would like to install it on nmstate Pods, to verify that it resolves the issue first.

      The RPM build will expire and will get deleted in 10 days.

      @azavalko@redhat.com are you able to reproduce the issue? So we can confirm that the new RPM fixes it?

      — Additional comment from on 2022-07-14 13:56:56 UTC —

      The old RPM was expired. We now have a new one and will test it.

      — Additional comment from on 2022-07-14 14:28:59 UTC —

      Verified on both 4.10 and 4.11:
      Updated brew link:
      https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46526028
      I installed buildArch (nmstate-1.2.1-4.el8_6.src.rpm, x86_64)

      I installed the new nmstate patch directly on the nmstate-handler pods (4.11 example):
      for i in $(oc get pod -n openshift-nmstate -l component=kubernetes-nmstate-handler --no-headers |awk '

      {print $1}

      '); do oc exec -n openshift-nmstate $i – dnf install -y http://download.eng.bos.redhat.com/brewroot/work/tasks/6048/46526048/nmstate-1.2.1-4.el8_6.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/work/tasks/6048/46526048/python3-libnmstate-1.2.1-4.el8_6.noarch.rpm http://download.eng.bos.redhat.com/brewroot/work/tasks/6048/46526048/nmstate-plugin-ovsdb-1.2.1-4.el8_6.noarch.rpm; done

      Clusters info:
      1. bm03-tlv2
      [cnv-qe-jenkins@cnvqe-01 cnv-tests]$ oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-rc.1 True False 2d23h Cluster version is 4.11.0-rc.1

      Client Version: 4.11.0-rc.1
      Kustomize Version: v4.5.4
      Server Version: 4.11.0-rc.1
      Kubernetes Version: v1.24.0+2dd8bb1

      2. bm02-cnvqe2-rdu2
      cnv-qe-jenkins@cnv-qe-infra-01:~/anat/4.10/cnv-tests$ oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.10.22 True False 5d22h Cluster version is 4.10.22

      Client Version: 4.9.42
      Server Version: 4.10.22
      Kubernetes Version: v1.23.5+3afdacb

      Not closing the Bugzilla because this is a verification of the fix proposition and not of the actual fix.

      — Additional comment from Adi Zavalkovsky on 2022-07-17 12:15:44 UTC —

      Removing needinfo flags.

      — Additional comment from Gris Ge on 2022-07-19 14:45:15 UTC —

      The RHEL 8.6 zstream for nmstate is approved at https://bugzilla.redhat.com/show_bug.cgi?id=2108632
      It will be ship on 2022-08-02. Is that OK for CNV?

      — Additional comment from Gris Ge on 2022-07-20 01:48:39 UTC —

      Official build attached to errata https://errata.devel.redhat.com/advisory/98693 and signed.
      There is no code difference between previous scratch build and this build.

      — Additional comment from Petr Horáček on 2022-07-21 08:00:40 UTC —

      Thank you Gris for looking into this. The release date works for us.

      — Additional comment from Petr Horáček on 2022-07-21 08:01:31 UTC —

      Waiting until August 2 for the fix to become avaialable in RHEL, so we can rebuild downstream images.

      — Additional comment from Petr Horáček on 2022-08-22 10:07:31 UTC —

      The fix should become available with nmstate-1.2.1-4.el8_6. The current released knmstate is still using nmstate-1.2.1-3.el8_6: https://catalog.redhat.com/software/containers/openshift4/ose-kubernetes-nmstate-handler-rhel8/5e97379dbed8bd66f83dffb0?tag=v4.11.0-202208020235.p0.ga6744d1.assembly.stream&push_date=1660126963000&container-tabs=packages

      — Additional comment from Petr Horáček on 2022-10-03 14:27:33 UTC —

      The fix should be available in the recent knmstate 4.11 builds

      — Additional comment from Yossi Segev on 2022-10-20 10:32:17 UTC —

      I installed a new BM cluster (OCP 4.11.9) with the latest knmstate, and it still uses nmstate-1.2.1-3.el8_6.x86_64.
      Can't verify this bug yet.

              phoracek@redhat.com Petr Horacek
              rh-ee-awax Anat Wax
              Anat Wax Anat Wax
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: