-
Bug
-
Resolution: Done
-
Critical
-
4.13
-
None
-
Critical
-
No
-
NHE Sprint 233
-
1
-
Rejected
-
False
-
-
NA
Description of problem:
Deploy dualstack OCP cluster with baremetal worker nodes and then enable ovs harward offload by creating sriovnetworkpoolconfig with yaml below, ovnkube pods of the baremetal workers crashed. Check 'ovs-vsctl show' in the worker nodes, physical inteface is no longer under br-ex. # cat sriov_pool.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: sriovnetworkpoolconfig-offload namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: sriov # oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-77mlf 6/6 Running 0 3h29m 192.168.111.20 master-0.offload.openshift-qe.sdn.com <none> <none> ovnkube-master-fm2lb 6/6 Running 0 3h29m 192.168.111.22 master-2.offload.openshift-qe.sdn.com <none> <none> ovnkube-master-skdmr 6/6 Running 2 (3h20m ago) 3h29m 192.168.111.21 master-1.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-7jqmp 5/5 Running 1 (3h28m ago) 3h29m 192.168.111.21 master-1.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-84dqq 5/5 Running 1 (3h28m ago) 3h29m 192.168.111.20 master-0.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-dvkkg 5/5 Running 1 (3h28m ago) 3h29m 192.168.111.22 master-2.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-gc6nd 5/5 Running 1 (3h9m ago) 3h9m 192.168.111.23 worker-0.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-gqm9t 5/5 Running 1 (3h8m ago) 3h9m 192.168.111.24 worker-1.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-jpfg6 4/5 CrashLoopBackOff 26 (4m13s ago) 153m 192.168.111.40 openshift-qe-025.lab.eng.rdu2.redhat.com <none> <none> ovnkube-node-svljb 4/5 CrashLoopBackOff 24 (2m15s ago) 152m 192.168.111.47 openshift-qe-029.lab.eng.rdu2.redhat.com <none> <none> sh-4.4# ovs-vsctl show 91e93171-1a86-48a4-a2a5-22f958c39ae8 Bridge br-int fail_mode: secure datapath_type: system Port br-int Interface br-int type: internal Port ovn-2a6cca-0 Interface ovn-2a6cca-0 type: geneve options: {csum="true", key=flow, remote_ip="192.168.111.21"} Port ovn-f8b96d-0 Interface ovn-f8b96d-0 type: geneve options: {csum="true", key=flow, remote_ip="192.168.111.23"} Port ovn-19750f-0 Interface ovn-19750f-0 type: geneve options: {csum="true", key=flow, remote_ip="192.168.111.47"} Port ovn-7cc33c-0 Interface ovn-7cc33c-0 type: geneve options: {csum="true", key=flow, remote_ip="192.168.111.24"} Port ovn-451129-0 Interface ovn-451129-0 type: geneve options: {csum="true", key=flow, remote_ip="192.168.111.22"} Port ovn-fe9b22-0 Interface ovn-fe9b22-0 type: geneve options: {csum="true", key=flow, remote_ip="192.168.111.20"} Port ovn-k8s-mp0 Interface ovn-k8s-mp0 type: internal Port patch-br-int-to-br-ex_openshift-qe-025.lab.eng.rdu2.redhat.com Interface patch-br-int-to-br-ex_openshift-qe-025.lab.eng.rdu2.redhat.com type: patch options: {peer=patch-br-ex_openshift-qe-025.lab.eng.rdu2.redhat.com-to-br-int} Bridge br-ex Port patch-br-ex_openshift-qe-025.lab.eng.rdu2.redhat.com-to-br-int Interface patch-br-ex_openshift-qe-025.lab.eng.rdu2.redhat.com-to-br-int type: patch options: {peer=patch-br-int-to-br-ex_openshift-qe-025.lab.eng.rdu2.redhat.com} ovs_version: "2.17.6"
Version-Release number of selected component (if applicable):
4.13
How reproducible:
Steps to Reproduce:
1. Deploy dualstack cluster and add baremetal hosts as worker nodes. 2. install sriov network operator 3. Enable ovs hardware offload by creating yaml below # cat sriov_pool.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: sriovnetworkpoolconfig-offload namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: sriov 4. check ovnkube pods 5. check 'ovs-vsctl show' in worker node.
Actual results:
ovnkube pods crashed
Expected results:
ovnkube pods should not crash
Additional info:
ovnkube pods logs: [root@openshift-qe-026 offload]# oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-77mlf 6/6 Running 0 3h2m 192.168.111.20 master-0.offload.openshift-qe.sdn.com <none> <none> ovnkube-master-fm2lb 6/6 Running 0 3h2m 192.168.111.22 master-2.offload.openshift-qe.sdn.com <none> <none> ovnkube-master-skdmr 6/6 Running 2 (173m ago) 3h2m 192.168.111.21 master-1.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-7jqmp 5/5 Running 1 (3h ago) 3h2m 192.168.111.21 master-1.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-84dqq 5/5 Running 1 (3h ago) 3h2m 192.168.111.20 master-0.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-dvkkg 5/5 Running 1 (3h ago) 3h2m 192.168.111.22 master-2.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-gc6nd 5/5 Running 1 (162m ago) 162m 192.168.111.23 worker-0.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-gqm9t 5/5 Running 1 (161m ago) 162m 192.168.111.24 worker-1.offload.openshift-qe.sdn.com <none> <none> ovnkube-node-jpfg6 4/5 CrashLoopBackOff 21 (2m8s ago) 125m 192.168.111.40 openshift-qe-025.lab.eng.rdu2.redhat.com <none> <none> ovnkube-node-svljb 4/5 CrashLoopBackOff 19 (25s ago) 124m 192.168.111.47 openshift-qe-029.lab.eng.rdu2.redhat.com <none> <none> [root@openshift-qe-026 offload]# [root@openshift-qe-026 offload]# oc logs ovnkube-node-jpfg6 -n openshift-ovn-kubernetes Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy, kube-rbac-proxy-ovn-metrics, ovnkube-node 2023-02-22T06:15:34+00:00 - starting ovn-controller 2023-02-22T06:15:34Z|00001|vlog|INFO|opened log file /var/log/ovn/acl-audit-log.log 2023-02-22T06:15:34.651Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2023-02-22T06:15:34.651Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2023-02-22T06:15:34.653Z|00004|main|INFO|OVN internal version is : [22.12.1-20.27.0-70.6] 2023-02-22T06:15:34.653Z|00005|main|INFO|OVS IDL reconnected, force recompute. 2023-02-22T06:15:34.656Z|00006|reconnect|INFO|ssl:192.168.111.21:9642: connecting... 2023-02-22T06:15:34.656Z|00007|main|INFO|OVNSB IDL reconnected, force recompute. 2023-02-22T06:15:34.661Z|00008|reconnect|INFO|ssl:192.168.111.21:9642: connected 2023-02-22T06:15:34.748Z|00009|features|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-02-22T06:15:34.748Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-02-22T06:15:34.750Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-02-22T06:15:34.750Z|00012|features|INFO|OVS Feature: ct_zero_snat, state: supported 2023-02-22T06:15:34.750Z|00013|main|INFO|OVS feature set changed, force recompute. 2023-02-22T06:15:34.750Z|00014|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-02-22T06:15:34.750Z|00015|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-02-22T06:15:34.751Z|00016|main|INFO|OVS feature set changed, force recompute. 2023-02-22T06:15:34.751Z|00017|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-02-22T06:15:34.751Z|00018|binding|INFO|Releasing lport openshift-network-diagnostics_network-check-target-zl74q from this chassis (sb_readonly=0) 2023-02-22T06:15:34.751Z|00019|if_status|WARN|Trying to release unknown interface openshift-network-diagnostics_network-check-target-zl74q 2023-02-22T06:15:34.751Z|00020|binding|INFO|Releasing lport openshift-multus_network-metrics-daemon-jxqlb from this chassis (sb_readonly=0) 2023-02-22T06:15:34.751Z|00021|binding|INFO|Releasing lport openshift-ingress-canary_ingress-canary-mnxvc from this chassis (sb_readonly=0) 2023-02-22T06:15:34.751Z|00022|binding|INFO|Releasing lport openshift-cluster-csi-drivers_shared-resource-csi-driver-node-s2fdx from this chassis (sb_readonly=0) 2023-02-22T06:15:34.751Z|00023|binding|INFO|Releasing lport openshift-dns_dns-default-rf5cq from this chassis (sb_readonly=0) 2023-02-22T06:15:34.789Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-02-22T06:15:34.789Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-02-22T06:15:34.789Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-02-22T06:15:44.757Z|00024|memory|INFO|26836 kB peak resident set size after 10.1 seconds 2023-02-22T06:15:44.757Z|00025|memory|INFO|idl-cells-OVN_Southbound:39984 idl-cells-Open_vSwitch:644 lflow-cache-entries-cache-expr:563 lflow-cache-entries-cache-matches:817 lflow-cache-size-KB:1467 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:682 ofctrl_installed_flow_usage-KB:528 ofctrl_sb_flow_ref_usage-KB:305 2023-02-22T06:16:07.540Z|00026|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:17:46.291Z|00027|lflow_cache|INFO|Detected cache inactivity (last active 30002 ms ago): trimming cache 2023-02-22T06:20:06.185Z|00028|lflow_cache|INFO|Detected cache inactivity (last active 30002 ms ago): trimming cache 2023-02-22T06:24:08.043Z|00029|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:26:26.560Z|00030|lflow_cache|INFO|Detected cache inactivity (last active 30001 ms ago): trimming cache 2023-02-22T06:28:03.488Z|00031|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:31:18.754Z|00032|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:34:14.578Z|00033|lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago): trimming cache 2023-02-22T06:36:54.982Z|00034|lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago): trimming cache 2023-02-22T06:42:19.599Z|00035|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:44:34.134Z|00036|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:45:37.803Z|00037|lflow_cache|INFO|Detected cache inactivity (last active 30002 ms ago): trimming cache [root@openshift-qe-026 offload]# [root@openshift-qe-026 offload]# [root@openshift-qe-026 offload]# oc logs ovnkube-node-svljb -n openshift-ovn-kubernetes Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy, kube-rbac-proxy-ovn-metrics, ovnkube-node 2023-02-22T06:27:30+00:00 - starting ovn-controller 2023-02-22T06:27:30Z|00001|vlog|INFO|opened log file /var/log/ovn/acl-audit-log.log 2023-02-22T06:27:30.480Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2023-02-22T06:27:30.480Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2023-02-22T06:27:30.482Z|00004|main|INFO|OVN internal version is : [22.12.1-20.27.0-70.6] 2023-02-22T06:27:30.482Z|00005|main|INFO|OVS IDL reconnected, force recompute. 2023-02-22T06:27:30.486Z|00006|reconnect|INFO|ssl:192.168.111.22:9642: connecting... 2023-02-22T06:27:30.486Z|00007|main|INFO|OVNSB IDL reconnected, force recompute. 2023-02-22T06:27:30.502Z|00008|reconnect|INFO|ssl:192.168.111.22:9642: connected 2023-02-22T06:27:30.600Z|00009|features|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-02-22T06:27:30.600Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-02-22T06:27:30.604Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-02-22T06:27:30.604Z|00012|features|INFO|OVS Feature: ct_zero_snat, state: supported 2023-02-22T06:27:30.604Z|00013|main|INFO|OVS feature set changed, force recompute. 2023-02-22T06:27:30.604Z|00014|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-02-22T06:27:30.604Z|00015|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-02-22T06:27:30.605Z|00016|main|INFO|OVS feature set changed, force recompute. 2023-02-22T06:27:30.605Z|00017|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-02-22T06:27:30.605Z|00018|binding|INFO|Releasing lport openshift-ingress-canary_ingress-canary-sdqkp from this chassis (sb_readonly=0) 2023-02-22T06:27:30.605Z|00019|if_status|WARN|Trying to release unknown interface openshift-ingress-canary_ingress-canary-sdqkp 2023-02-22T06:27:30.605Z|00020|binding|INFO|Releasing lport openshift-network-diagnostics_network-check-target-9mvkj from this chassis (sb_readonly=0) 2023-02-22T06:27:30.605Z|00021|binding|INFO|Releasing lport openshift-dns_dns-default-jlq8p from this chassis (sb_readonly=0) 2023-02-22T06:27:30.605Z|00022|binding|INFO|Releasing lport openshift-cluster-csi-drivers_shared-resource-csi-driver-node-qfj6f from this chassis (sb_readonly=0) 2023-02-22T06:27:30.605Z|00023|binding|INFO|Releasing lport openshift-multus_network-metrics-daemon-ghnsn from this chassis (sb_readonly=0) 2023-02-22T06:27:30.651Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-02-22T06:27:30.651Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-02-22T06:27:30.651Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-02-22T06:27:40.509Z|00024|memory|INFO|24848 kB peak resident set size after 10.0 seconds 2023-02-22T06:27:40.509Z|00025|memory|INFO|idl-cells-OVN_Southbound:39933 idl-cells-Open_vSwitch:644 lflow-cache-entries-cache-expr:563 lflow-cache-entries-cache-matches:817 lflow-cache-size-KB:1467 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:675 ofctrl_installed_flow_usage-KB:521 ofctrl_sb_flow_ref_usage-KB:303 2023-02-22T06:28:03.486Z|00026|lflow_cache|INFO|Detected cache inactivity (last active 30003 ms ago): trimming cache 2023-02-22T06:31:18.753Z|00027|lflow_cache|INFO|Detected cache inactivity (last active 30003 ms ago): trimming cache 2023-02-22T06:34:14.578Z|00028|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:36:54.981Z|00029|lflow_cache|INFO|Detected cache inactivity (last active 30005 ms ago): trimming cache 2023-02-22T06:42:19.598Z|00030|lflow_cache|INFO|Detected cache inactivity (last active 30004 ms ago): trimming cache 2023-02-22T06:44:34.134Z|00031|lflow_cache|INFO|Detected cache inactivity (last active 30003 ms ago): trimming cache 2023-02-22T06:45:37.803Z|00032|lflow_cache|INFO|Detected cache inactivity (last active 30001 ms ago): trimming cache [root@openshift-qe-026 offload]# must-gather logs: https://file.apac.redhat.com/~yingwang/must-gather.tar.gz
- depends on
-
OCPBUGS-10279 [4.14] ovnkube pod crashed after enable ovs hardware offload in baremetal cluster
- Closed
- is cloned by
-
OCPBUGS-10279 [4.14] ovnkube pod crashed after enable ovs hardware offload in baremetal cluster
- Closed
-
OCPBUGS-10280 [4.12] ovnkube pod crashed after enable ovs hardware offload in baremetal cluster
- Closed
- is depended on by
-
OCPBUGS-10280 [4.12] ovnkube pod crashed after enable ovs hardware offload in baremetal cluster
- Closed
- links to