-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
False
-
-
False
-
CLOSED
-
---
-
---
-
-
High
-
No
Description of problem:
On a destructive configuration policy, which involves all physical NICs of a node, and supposed to disable the connectivity of the node, the NNCE reports one state, while NNCP reports another.
Version-Release number of selected component (if applicable):
kubernetes-nmstate-handler-rhel8@sha256:4a1379bf1223cf064e54419721045ca1275ae57a04433db78d4a54e1269acee1
CNAO: sha256_379cfaaba59bae6089af24bb25c104e399e867b6732e5c8a33caf235
How reproducible:
Most of the times (the bug doesn't always occur).
Steps to Reproduce:
1. Apply a valid NNCP that affects all physical NICs of a node.
In the example given here I set all the NICs, which originally had dynamic IPs, to have static IPs. For each NIC I used the same dynamic IP that the DHCP server provide to it (to make sure I avoid IP conflicts).
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: static-nics
spec:
desiredState:
interfaces:
- name: ens3
type: ethernet
state: up
ipv4:
address: - ip: 172.16.0.33
prefix-length: 24
dhcp: false
enabled: true - name: ens6
type: ethernet
state: up
ipv4:
address: - ip: 172.16.0.19
prefix-length: 24
dhcp: false
enabled: true - name: ens7
type: ethernet
state: up
ipv4:
address: - ip: 172.16.0.49
prefix-length: 24
dhcp: false
enabled: true - name: ens8
type: ethernet
state: up
ipv4:
address: - ip: 172.16.0.14
prefix-length: 24
dhcp: false
enabled: true
nodeSelector:
kubernetes.io/hostname: "host-172-16-0-33"
2. After a long-enough timeout (~5 minutes) check the IP addresses of all the NIC that were set in this NNCP:
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$ ssh core@172.16.0.33 ip addr show dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
link/ether fa:16:3e:dc:3f:f6 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.33/24 brd 172.16.0.255 scope global dynamic noprefixroute ens3
valid_lft 86195sec preferred_lft 86195sec
inet6 fe80::f816:3eff:fedc:3ff6/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$ ssh core@172.16.0.33 ip addr show dev ens6
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
link/ether fa:16:3e:2e:76:aa brd ff:ff:ff:ff:ff:ff
inet 172.16.0.19/24 brd 172.16.0.255 scope global dynamic noprefixroute ens6
valid_lft 86192sec preferred_lft 86192sec
inet6 fe80::f816:3eff:fe2e:76aa/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$ ssh core@172.16.0.33 ip addr show dev ens7
4: ens7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
link/ether fa:16:3e:9d:e8:a3 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.49/24 brd 172.16.0.255 scope global dynamic noprefixroute ens7
valid_lft 86189sec preferred_lft 86189sec
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$ ssh core@172.16.0.33 ip addr show dev ens8
5: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
link/ether fa:16:3e:2d:ef:00 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.14/24 brd 172.16.0.255 scope global dynamic noprefixroute ens8
valid_lft 86186sec preferred_lft 86186sec
In all the cases, you can see that the address line contains the word "dynamic", which implies that the intended policy configuration was considered to be destructive, and therefore it was roll-backed.
3. Check the status of both NNCP and NNCE:
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$ oc get nncp static-nics
NAME STATUS
static-nics SuccessfullyConfigured
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$
[cnv-qe-jenkins@cnv-executor-ysegev-4-3 yossi]$ oc get nnce host-172-16-0-33.static-nics
NAME STATUS
host-172-16-0-33.static-nics ConfigurationProgressing
Actual results:
<BUG> Each shows a different status ("SuccessfullyConfigured" and "ConfigurationProgressing"), which is wrong in both cases.
In addition - the NNCE description ("oc get nnce host-172-16-0-33.static-nics -o yaml") doesn't include a rollback message.
Expected results:
1. The status of both NNCP and NNCE should be "ConfigurationFailed".
2. The current status condition in the NNCE should include a rollback message (search for the string "rollback" to verify).
Additional info:
This bug also happened on other scenarios, e.g. when the static IP's in the policy were different than those that were already dynamically given by the DHCP server.
However, in this scenario the occurrence of the bug was not consistent, and in some of the cases the behavior was as-expected (i.e. both NNCE and NNCP showed status "ConfigurationFailed", and the NNCE description included a rollback message).
The node's journalctl output is attached, with nmstate in TRACE log-level. It includes the timeline since just before applying the policy.
- external trackers