Loading...

Type: Bug
Resolution: Done-Errata
Priority: Minor
Fix Version/s: CNV v4.13.0
Affects Version/s: None
Component/s: CNV Network
Labels:
- cnv-4+
- cnvbugsm
- devel_ack+
- pm_ack+
- qa_ack+

Story Points:
2
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2156902
Bugzilla Bug:
RHBZ: 2156902
[QE] How to address?:
---
[QE] Why QE missed?:
---
Intelligence Requested:
Market:

Sprint:
cnv-netperf-20 230122
Severity:
Medium

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:
When a checkup encounter a setup failure, the components created by the job are not deleted.

Version-Release number of selected component (if applicable):
kubevirt-hyperconverged-operator.v4.12.0 OpenShift Virtualization 4.12.0 kubevirt-hyperconverged-operator.v4.11.1 Succeeded

Client Version: 4.12.0-rc.6
Kustomize Version: v4.5.7
Server Version: 4.12.0-rc.6
Kubernetes Version: v1.25.4+77bec7a

How reproducible:
Create a checkup configmap with a nonexistent node specified as the source node. The first virt-launcher pod will stay in pending mode and will never get to a running state and the actual checkup will never start.

Steps to Reproduce:
1. create a Namespace
oc new-project test-latency

2. Create a Bridge with this yaml:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br10
spec:
desiredState:
interfaces:

bridge:
options:
stp:
enabled: false
port:
name: ens9
ipv4:
auto-dns: true
dhcp: false
enabled: false
ipv6:
auto-dns: true
autoconf: false
dhcp: false
enabled: false
name: br10
state: up
type: linux-bridge
nodeSelector:
node-role.kubernetes.io/worker: ''

3. Create a NAD with this yaml:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: bridge-network-nad
spec:
config: |
{
"cniVersion":"0.3.1",
"name": "br10",
"plugins": [

{ "type": "cnv-bridge", "bridge": "br10" }

]
}
~

4. Create a service-account, role, and role-binding:
cat <<EOF | kubectl apply -f -
—
apiVersion: v1
kind: ServiceAccount
metadata:
name: vm-latency-checkup-sa
—
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kubevirt-vm-latency-checker
rules:

apiGroups: ["kubevirt.io"]
resources: ["virtualmachineinstances"]
verbs: ["get", "create", "delete"]
apiGroups: ["subresources.kubevirt.io"]
resources: ["virtualmachineinstances/console"]
verbs: ["get"]
apiGroups: ["k8s.cni.cncf.io"]
resources: ["network-attachment-definitions"]
verbs: ["get"]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubevirt-vm-latency-checker
subjects:
kind: ServiceAccount
name: vm-latency-checkup-sa
roleRef:
kind: Role
name: kubevirt-vm-latency-checker
apiGroup: rbac.authorization.k8s.io
—
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kiagnose-configmap-access
rules:
apiGroups: [ "" ]
resources: [ "configmaps" ]
verbs: ["get", "update"]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kiagnose-configmap-access
subjects:
kind: ServiceAccount
name: vm-latency-checkup-sa
roleRef:
kind: Role
name: kiagnose-configmap-access
apiGroup: rbac.authorization.k8s.io
EOF

5. Create the ConfigMap with the "spec.param.max_desired_latency_milliseconds" filed set to 0:
cat <<EOF | kubectl apply -f -
—
apiVersion: v1
kind: ConfigMap
metadata:
name: kubevirt-vm-latency-checkup-config
data:
spec.timeout: 5m
spec.param.network_attachment_definition_namespace: "manual-latency-check"
spec.param.network_attachment_definition_name: "bridge-network-nad"
spec.param.max_desired_latency_milliseconds: "0"
spec.param.sample_duration_seconds: "5"
spec.param.source_node: non-existent-node
spec.param.target_node: cnv-qe-14.cnvqe.lab.eng.rdu2.redhat.com
EOF

6. Create a job:
cat <<EOF | kubectl apply -f -
—
apiVersion: batch/v1
kind: Job
metadata:
name: kubevirt-vm-latency-checkup
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: vm-latency-checkup-sa
restartPolicy: Never
containers:

name: vm-latency-checkup
image: brew.registry.redhat.io/rh-osbs/container-native-virtualization-vm-network-latency-checkup:v4.12.0
securityContext:
runAsUser: 1000
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
runAsNonRoot: true
seccompProfile:
type: "RuntimeDefault"
env:
name: CONFIGMAP_NAMESPACE
value: test-latency
name: CONFIGMAP_NAME
value: kubevirt-vm-latency-checkup-config
EOF

Actual results:
When the job is deleted the pods and VMI's are not deleted:
oc get all
NAME READY STATUS RESTARTS AGE
pod/latency-nonexistent-node-job-qt4wk 0/1 Error 0 74m
pod/virt-launcher-latency-check-source-4fqgk 0/2 Pending 0 74m
pod/virt-launcher-latency-check-target-smj9r 2/2 Running 0 74m

NAME COMPLETIONS DURATION AGE
job.batch/latency-nonexistent-node-job 0/1 74m 74m

NAME AGE PHASE IP NODENAME READY
virtualmachineinstance.kubevirt.io/latency-check-source 74m Scheduling False
virtualmachineinstance.kubevirt.io/latency-check-target 74m Running 192.168.100.20 cnv-qe-14.cnvqe.lab.eng.rdu2.redhat.com True

Expected results:
All the resources created by the Job are deleted as the job gets deleted.

blocks

CNV-24121 [2159397] VM latency checkup - Checkup not performing a teardown in case of setup failure