Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.14
Component/s: GitOps ZTP
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

When creating a new SriovNetworkNodePolicy the SNO gets drained leading to all existing pods getting restarted. The pods eventually recover but this leads to some of the cluster operators and users workload becoming unavailable while the pods are getting terminated.

Version-Release number of selected component (if applicable):

4.14.0-rc.0
sriov-network-operator.v4.14.0-202308242104

How reproducible:

100%

Steps to Reproduce:

1. On an SNO create a SriovNetworkNodePolicy

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: snnp1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  isRdma: false
  nicSelector:
    pfNames:
    - ens2f3#32-33
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numVfs: 48
  priority: 99
  resourceName: snnp1


2. Check sriov-network-config-daemon logs and other pods status

Actual results:

We can notice in the logs that the node gets drained and all the pods except DaemonSet-managed Pods get restarted.

I0911 14:07:10.657178   22691 daemon.go:804] annotateNode(): Annotate node sno.kni-qe-24.lab.eng.rdu2.redhat.com with: Draining
I0911 14:07:10.680706   22691 daemon.go:599] nodeStateSyncHandler(): pause MCP
I0911 14:07:10.690525   22691 daemon.go:916] pauseMCP(): pausing MCP
I0911 14:07:10.737330   22691 daemon.go:952] pauseMCP(): pause MCP master
I0911 14:07:10.773584   22691 daemon.go:804] annotateNode(): Annotate node sno.kni-qe-24.lab.eng.rdu2.redhat.com with: Draining_MCP_Paused
I0911 14:07:10.829568   22691 daemon.go:608] nodeStateSyncHandler(): drain node
I0911 14:07:10.829828   22691 daemon.go:1004] drainNode(): Update prepared
I0911 14:07:10.829862   22691 daemon.go:1014] drainNode(): Start draining
E0911 14:07:13.663558   22691 daemon.go:137] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-r9xrc, openshift-dns/dns-default-wrr86, openshift-dns/node-resolver-4tj87, openshift-image-registry/node-ca-wphrb, openshift-ingress-canary/ingress-canary-nf7p9, openshift-local-storage/diskmaker-manager-44hcj, openshift-logging/collector-25gbm, openshift-machine-config-operator/machine-config-daemon-hrlhx, openshift-machine-config-operator/machine-config-server-v5pmr, openshift-monitoring/node-exporter-s6m5q, openshift-multus/multus-additional-cni-plugins-7pjff, openshift-multus/multus-networkpolicy-gkwwp, openshift-multus/multus-w525k, openshift-multus/network-metrics-daemon-l65kc, openshift-ovn-kubernetes/ovnkube-node-qw8fz, openshift-ptp/linuxptp-daemon-hsrz4, openshift-sriov-network-operator/network-resources-injector-h7lk8, openshift-sriov-network-operator/operator-webhook-zmbpq, openshift-sriov-network-operator/sriov-device-plugin-vnz49, openshift-sriov-network-operator/sriov-network-config-daemon-xjvwj, vran-acceleration-operators/accelerator-discovery-bl88c, vran-acceleration-operators/sriov-device-plugin-zhtxj, vran-acceleration-operators/sriov-fec-daemonset-gs2bn
I0911 14:07:13.665220   22691 daemon.go:137] evicting pod openshift-apiserver-operator/openshift-apiserver-operator-869988898d-pxxdg
I0911 14:07:13.665302   22691 daemon.go:137] evicting pod openshift-cluster-node-tuning-operator/cluster-node-tuning-operator-57f8dbf5f-q5p6n

Expected results:

Creating a new SriovNetworkNodePolicy on the SNO node doesn't lead to other pods restarts.

Additional info:

This appears to be a regression introduced in 4.14. The same scenario hasn't reproduced in 4.13.

Attaching must-gather.

Assignee:: Ian Miller

Reporter:: Marius Cornea

Need Info From:: None

Contributors:: None

QA Contact:: Zhanqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/09/11 2:31 PM

Updated:: 2025/07/25 5:31 PM

Resolved:: 2023/10/09 2:51 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates