-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.16.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Observed in RHOSO https://issues.redhat.com/browse/OSPRH-9899 where for a usecase we create multiple vlan interfaces and multiple routes attached. On these vlan interfaces we create macvlan NetworkAttachmentDefinitions and attach to pods. With nmstate-operator periodic updates(or restart of nmstate-handler) we noticed that some vlan interfaces get's recreated without any change to "NodeNetworkConfigurationPolicy" CR and this results into secondary nics(NetworkAttachmentDefinitions) removed from pods. And this requires pods to be recreated to get back the lost interfaces. From initial finding this only happens when multiple ip routes are involved for these vlan interfaces and table-id is not set explicitly. We currently working around by setting table-id optional field explicitly https://github.com/openstack-k8s-operators/architecture/pull/460
Version-Release number of selected component (if applicable):
$ oc get csv -n openshift-nmstate kubernetes-nmstate-operator.4.16.0-202411251535 Kubernetes NMState Operator 4.16.0-202411251535 kubernetes-nmstate-operator.4.16.0-202411190033 Succeeded $ oc version Client Version: 4.17.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.16.0 Kubernetes Version: v1.29.5+29c95f3
How reproducible:
Random interfaces with below CR(for me it was quite consistent with below CR on one or other interface, increasing routes/vlan interfaces can be done to get it more reproducible):-
$ cat reproduce.yaml
kind: NodeNetworkConfigurationPolicy
metadata:
name: test-vlan
spec:
desiredState:
interfaces:
- description: vlan interface 11
ipv4:
address:
- ip: 172.11.0.5
prefix-length: 24
dhcp: false
enabled: true
ipv6:
enabled: false
name: enp5s0.11
state: up
type: vlan
vlan:
base-iface: enp5s0
id: 21
- description: vlan interface 12
ipv4:
address:
- ip: 172.12.0.5
prefix-length: 24
dhcp: false
enabled: true
ipv6:
enabled: false
name: enp5s0.12
state: up
type: vlan
vlan:
base-iface: enp5s0
id: 25
routes:
config:
- destination: 172.11.10.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp5s0.11
- destination: 172.11.20.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp5s0.11
- destination: 172.11.30.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp5s0.11
- destination: 172.11.40.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp5s0.11
- destination: 172.12.10.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp5s0.12
- destination: 172.12.20.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp5s0.12
- destination: 172.12.30.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp5s0.12
- destination: 172.12.40.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp5s0.12
Steps to Reproduce:
1. oc apply -f reproduce.yaml
2. check interfaces id with ip a | grep enp5s0
3. Delete nmstate handler pod with oc delete -n openshift-nmstate $(oc get pod -n openshift-nmstate -l component=kubernetes-nmstate-handler --no-headers -o name)
4. Wait for nncp to be reapplied oc oc get nncp test-vlan -w
5. Recheck interfaces id with ip a | grep enp5s0
Interface id changes when it reproduces like below
[zuul@controller-0 ~]$ oc apply -f reproduce.yaml
nodenetworkconfigurationpolicy.nmstate.io/test-vlan created
[zuul@controller-0 ~]$ oc get nncp test-vlan -w
NAME STATUS REASON
test-vlan Progressing ConfigurationProgressing
test-vlan Progressing ConfigurationProgressing
test-vlan Available SuccessfullyConfigured
[zuul@controller-0 ~]$ ssh crc-0 ip a|grep enp5s0
Warning: Permanently added 'crc-0.utility' (ED25519) to the list of known hosts.
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
9940: enp5s0.11@enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 172.11.0.5/24 brd 172.11.0.255 scope global noprefixroute enp5s0.11
9941: enp5s0.12@enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 172.12.0.5/24 brd 172.12.0.255 scope global noprefixroute enp5s0.12
[zuul@controller-0 ~]$ ssh crc-0 nmcli -g ipv4.routes c show enp5s0.11
Warning: Permanently added 'crc-0.utility' (ED25519) to the list of known hosts.
172.11.10.0/24 172.11.0.1 0, 172.11.20.0/24 172.11.0.1 0, 172.11.30.0/24 172.11.0.1 0, 172.11.40.0/24 172.11.0.1 0
[zuul@controller-0 ~]$ ssh crc-0 nmcli -g ipv4.routes c show enp5s0.12
Warning: Permanently added 'crc-0.utility' (ED25519) to the list of known hosts.
172.12.10.0/24 172.12.0.1 0, 172.12.20.0/24 172.12.0.1 0, 172.12.30.0/24 172.12.0.1 0, 172.12.40.0/24 172.12.0.1 0
[zuul@controller-0 ~]$ oc delete -n openshift-nmstate $(oc get pod -n openshift-nmstate -l component=kubernetes-nmstate-handler --no-headers -o name)
pod "nmstate-handler-f7qlf" deleted
[zuul@controller-0 ~]$ oc get nncp test-vlan -w
NAME STATUS REASON
test-vlan Available SuccessfullyConfigured
test-vlan
test-vlan
test-vlan Progressing ConfigurationProgressing
test-vlan Progressing ConfigurationProgressing
test-vlan Available SuccessfullyConfigured
[zuul@controller-0 ~]$ ssh crc-0 ip a|grep enp5s0
Warning: Permanently added 'crc-0.utility' (ED25519) to the list of known hosts.
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000
9940: enp5s0.11@enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 172.11.0.5/24 brd 172.11.0.255 scope global noprefixroute enp5s0.11
9944: enp5s0.12@enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 172.12.0.5/24 brd 172.12.0.255 scope global noprefixroute enp5s0.12
# enp5s0.12 got recreated as can see interface id changed from 9941 to 9944
[zuul@controller-0 ~]$ ssh crc-0 nmcli -g ipv4.routes c show enp5s0.11
Warning: Permanently added 'crc-0.utility' (ED25519) to the list of known hosts.
172.11.10.0/24 172.11.0.1 0 table=254, 172.11.20.0/24 172.11.0.1 0 table=254, 172.11.30.0/24 172.11.0.1 0 table=254, 172.11.40.0/24 172.11.0.1 0 table=254
[zuul@controller-0 ~]$ ssh crc-0 nmcli -g ipv4.routes c show enp5s0.12
Warning: Permanently added 'crc-0.utility' (ED25519) to the list of known hosts.
172.12.10.0/24 172.12.0.1 0, 172.12.20.0/24 172.12.0.1 0 table=254, 172.12.30.0/24 172.12.0.1 0, 172.12.40.0/24 172.12.0.1 0
# enp5s0.11 not changed likely because all 4 routes of it has table=254 set
Actual results:
Interfaces recreated without any change in desired state
Expected results:
Interfaces should not get recreated with new nmstate operator or nmstate handler restart without any change in desiredState
Additional info:
Issue do not reproduce if table-id is set explicitly i.e
$ cat noreproduce.yaml
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: test-vlan
spec:
desiredState:
interfaces:
- description: vlan interface 11
ipv4:
address:
- ip: 172.11.0.5
prefix-length: 24
dhcp: false
enabled: true
ipv6:
enabled: false
name: enp5s0.11
state: up
type: vlan
vlan:
base-iface: enp5s0
id: 21
- description: vlan interface 12
ipv4:
address:
- ip: 172.12.0.5
prefix-length: 24
dhcp: false
enabled: true
ipv6:
enabled: false
name: enp5s0.12
state: up
type: vlan
vlan:
base-iface: enp5s0
id: 25
routes:
config: - destination: 172.11.10.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp2s0.11
table-id: 254
- destination: 172.11.20.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp2s0.11
table-id: 254
- destination: 172.11.30.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp2s0.11
table-id: 254
- destination: 172.11.40.0/24
next-hop-address: 172.11.0.1
next-hop-interface: enp2s0.11
table-id: 254
- destination: 172.12.10.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp2s0.12
table-id: 254
- destination: 172.12.20.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp2s0.12
table-id: 254
- destination: 172.12.30.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp2s0.12
table-id: 254
- destination: 172.12.40.0/24
next-hop-address: 172.12.0.1
next-hop-interface: enp2s0.12
table-id: 254
- is triggered by
-
OSPRH-9899 ovn-controller loses the connection to ovsdbservers after nmstate is automatically upgraded to newer version
-
- Verified
-