-
Bug
-
Resolution: Duplicate
-
None
-
False
-
-
False
-
CLOSED
-
---
-
---
-
Medium
-
None
+++ This bug was initially created as a clone of Bug #2087172 +++
Description of problem:
When having multiple SriovNetworkNodePolicy for the same iface (Which hold different config), NNCP deployment sometimes fails with the following message:
libnmstate.error.NmstateVerificationError
Found VF ports count does not match desired 32, current is:
NNCE cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com.static-ip-cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com: libnmstate.error.NmstateVerificationError"
To clarify - applied VF ports count is 0, because one SriovNetworkNodePolicy sets desired to 0. The other policy sets desired to 32.
Both policies attached.
NNS info about said interface -
[adi@fedora cnv-tests]$ oc get nns cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com -o yaml
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkState
...
- ethernet:
auto-negotiation: false
duplex: full
speed: 10000
sr-iov:
total-vfs: 0
vfs: []
ipv4:
address: - ip: 10.1.156.17
prefix-length: 24
auto-dns: true
auto-gateway: true
auto-route-table-id: 0
auto-routes: true
dhcp: true
enabled: true
ipv6:
address: - ip: fe80::e643:4bff:feec:8400
prefix-length: 64
auto-dns: true
auto-gateway: true
auto-route-table-id: 0
auto-routes: true
autoconf: true
dhcp: true
enabled: true
lldp:
enabled: false
mac-address: E4:43:4B:EC:84:00
mtu: 1500
name: eno1
state: up
type: ethernet
...
Version-Release number of selected component (if applicable):
kubernetes-nmstate-handler v4.10.1-12
How reproducible:
On any Openshift cluster with CNV and SRIOV operator.
Steps to Reproduce:
1. Deploy attached SriovNetworkNodePolicys (Default my be applied when installing SRIOV-operator, so no need to apply it).
2. Deploy following NNCP (Adjust values) -
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: static-ip-cnv-qe-infra-18.cnvqe2.lab.eng.rdu2.redhat.com
spec:
desiredState:
interfaces:
- ipv4:
address: - ip: 10.1.156.18
prefix-length: 24
auto-dns: true
dhcp: false
enabled: true
ipv6:
address: - ip: fe80::e643:4bff:feec:76d0
prefix-length: 64
auto-dns: true
autoconf: false
dhcp: false
enabled: true
name: eno1
state: up
type: ethernet
nodeSelector:
kubernetes.io/hostname: cnv-qe-infra-18.cnvqe2.lab.eng.rdu2.redhat.com
Actual results:
NNCP deployment fails with the following message:
libnmstate.error.NmstateVerificationError
Found VF ports count does not match desired 32, current is:
NNCE cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com.static-ip-cnv-qe-infra-17.cnvqe2.lab.eng.rdu2.redhat.com: libnmstate.error.NmstateVerificationError"
Expected results:
NNCP deployment should be applied succesfully
Additional info:
Two points that this bug should address -
1. Why should nmstate be concerned with SRIOV config, when it's not required to make changes.
2. Why isn't nmstate able to determine which of the two sriov policies represents actual desired state? If SRIOV operator was able to deploy both policies, nmstate shouldn't bother with this.
— Additional comment from Quique Llorente on 2022-05-26 10:07:51 UTC —
@azavalko Can you also add full nmstate logs either from the NNCE digest or from handler pod logs.
Clearly nmstate should not take into account sriov here, I remember we where having similar issues with vxlan + openshift-sdn, at the end they fixed it by ignoring vxlan if is not part of the configuration, similar solution should fix this.
— Additional comment from Adi Zavalkovsky on 2022-06-06 14:13:06 UTC —
— Additional comment from Quique Llorente on 2022-07-04 06:07:48 UTC —
@azavalko@redhat.com can you test this with nmstate scratch build azavalko@redhat.com ?
— Additional comment from Quique Llorente on 2022-07-04 06:08:49 UTC —
I mean this https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46367733
— Additional comment from Quique Llorente on 2022-07-13 10:00:25 UTC —
@gris the scrach build https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46367733 is missing python3-libnmstate
— Additional comment from Petr Horáček on 2022-07-14 12:58:56 UTC —
We have an RPM of nmstate that should fix it. We would like to install it on nmstate Pods, to verify that it resolves the issue first.
The RPM build will expire and will get deleted in 10 days.
@azavalko@redhat.com are you able to reproduce the issue? So we can confirm that the new RPM fixes it?
— Additional comment from on 2022-07-14 13:56:56 UTC —
The old RPM was expired. We now have a new one and will test it.
— Additional comment from on 2022-07-14 14:28:59 UTC —
Verified on both 4.10 and 4.11:
Updated brew link:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46526028
I installed buildArch (nmstate-1.2.1-4.el8_6.src.rpm, x86_64)
I installed the new nmstate patch directly on the nmstate-handler pods (4.11 example):
for i in $(oc get pod -n openshift-nmstate -l component=kubernetes-nmstate-handler --no-headers |awk '
'); do oc exec -n openshift-nmstate $i – dnf install -y http://download.eng.bos.redhat.com/brewroot/work/tasks/6048/46526048/nmstate-1.2.1-4.el8_6.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/work/tasks/6048/46526048/python3-libnmstate-1.2.1-4.el8_6.noarch.rpm http://download.eng.bos.redhat.com/brewroot/work/tasks/6048/46526048/nmstate-plugin-ovsdb-1.2.1-4.el8_6.noarch.rpm; done
Clusters info:
1. bm03-tlv2
[cnv-qe-jenkins@cnvqe-01 cnv-tests]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-rc.1 True False 2d23h Cluster version is 4.11.0-rc.1
Client Version: 4.11.0-rc.1
Kustomize Version: v4.5.4
Server Version: 4.11.0-rc.1
Kubernetes Version: v1.24.0+2dd8bb1
2. bm02-cnvqe2-rdu2
cnv-qe-jenkins@cnv-qe-infra-01:~/anat/4.10/cnv-tests$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.22 True False 5d22h Cluster version is 4.10.22
Client Version: 4.9.42
Server Version: 4.10.22
Kubernetes Version: v1.23.5+3afdacb
Not closing the Bugzilla because this is a verification of the fix proposition and not of the actual fix.
— Additional comment from Adi Zavalkovsky on 2022-07-17 12:15:44 UTC —
Removing needinfo flags.
— Additional comment from Gris Ge on 2022-07-19 14:45:15 UTC —
The RHEL 8.6 zstream for nmstate is approved at https://bugzilla.redhat.com/show_bug.cgi?id=2108632
It will be ship on 2022-08-02. Is that OK for CNV?
— Additional comment from Gris Ge on 2022-07-20 01:48:39 UTC —
Official build attached to errata https://errata.devel.redhat.com/advisory/98693 and signed.
There is no code difference between previous scratch build and this build.
— Additional comment from Petr Horáček on 2022-07-21 08:00:40 UTC —
Thank you Gris for looking into this. The release date works for us.
— Additional comment from Petr Horáček on 2022-07-21 08:01:31 UTC —
Waiting until August 2 for the fix to become avaialable in RHEL, so we can rebuild downstream images.
— Additional comment from Petr Horáček on 2022-08-22 10:07:31 UTC —
The fix should become available with nmstate-1.2.1-4.el8_6. The current released knmstate is still using nmstate-1.2.1-3.el8_6: https://catalog.redhat.com/software/containers/openshift4/ose-kubernetes-nmstate-handler-rhel8/5e97379dbed8bd66f83dffb0?tag=v4.11.0-202208020235.p0.ga6744d1.assembly.stream&push_date=1660126963000&container-tabs=packages
— Additional comment from Petr Horáček on 2022-10-03 14:27:33 UTC —
The fix should be available in the recent knmstate 4.11 builds
— Additional comment from Yossi Segev on 2022-10-20 10:32:17 UTC —
I installed a new BM cluster (OCP 4.11.9) with the latest knmstate, and it still uses nmstate-1.2.1-3.el8_6.x86_64.
Can't verify this bug yet.
- duplicates
-
CNV-18652 [2087172] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy
- Closed
-
CNV-19788 [2087172] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy
- Closed
-
CNV-29293 [Backport-4.10.z] [2087172] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy
- Closed
- is blocked by
-
CNV-18652 [2087172] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy
- Closed
-
CNV-19788 [2087172] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy
- Closed
-
CNV-29293 [Backport-4.10.z] [2087172] NNCP deployment flakiness when having multiple SriovNetworkNodePolicy
- Closed
- external trackers