Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: CNV v4.21.0
Affects Version/s: CNV v4.20.0
Component/s: CNV Virt-Node
Labels:
- perf-dept

Activity Type:
Quality / Stability / Reliability
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
None
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

While testing 4.20 I can occasionally reproduce a state under high node densities where a handful of virt-launcher pods go from Running to NotReady state. In theses cases the NotReady pods are always co-located on the same worker node (although specific node can change from test to test) and the VMIs report the generic 10.0.2.2 ip, although a console shows the VM is up and otherwise running fine, it can ping out just fine, etc. 
Note this is reproducing even when running 4.19 crio version so likely not related to the crio change investigated in OCPBUGS-60605. 

# oc get pod -n virt-density -o wide | grep NotReady
virt-launcher-virt-density-192-w6dfx   2/3     NotReady   0          3h23m   10.131.0.91    worker00   <none>           1/1
virt-launcher-virt-density-207-vp2ft   2/3     NotReady   0          3h23m   10.131.0.96    worker00   <none>           1/1
virt-launcher-virt-density-240-tswg4   2/3     NotReady   0          3h23m   10.131.0.108   worker00   <none>           1/1
virt-launcher-virt-density-279-v7wh9   2/3     NotReady   0          3h22m   10.131.0.120   worker00   <none>           1/1
virt-launcher-virt-density-297-s49xh   2/3     NotReady   0          3h22m   10.131.0.127   worker00   <none>           1/1
virt-launcher-virt-density-313-qtfn2   2/3     NotReady   0          3h22m   10.131.0.132   worker00   <none>           1/1

# oc get vmi -A | grep False
virt-density   virt-density-192   3h23m   Running   10.0.2.2       worker00   False
virt-density   virt-density-207   3h23m   Running   10.0.2.2       worker00   False
virt-density   virt-density-240   3h23m   Running   10.0.2.2       worker00   False
virt-density   virt-density-279   3h23m   Running   10.0.2.2       worker00   False
virt-density   virt-density-297   3h23m   Running   10.0.2.2       worker00   False
virt-density   virt-density-313   3h23m   Running   10.0.2.2       worker00   False

Note we do have one known NotReady scenario during 4.19 mass migration testing, tracked in CNV-67948, but not sure if its related yet.

Version-Release number of selected component (if applicable):

OCP 4.20.0-ec.6, Virt 4.20.0-144

How reproducible:

Some runs of 200VMs per node are successful, some runs hit this NotReady error, usually for only ~5 pods or so each time in this environment.

Steps to Reproduce:

1. Start 200 VMs per node at once, check pod states

Actual results:

Not all pods stay in Running state

Expected results:

All pods stay in Running state

Additional info:


virt-density                                       virt-launcher-virt-density-313-qtfn2                              3/3     Running                     0             2m24s   10.131.0.132     worker00   <none>           1/1

virt-density                                       virt-launcher-virt-density-313-qtfn2                              2/3     NotReady                    0             3m23s   10.131.0.132     worker00   <none>           1/1


Worker00 is not too loaded in terms of resources:

  Resource                       Requests            Limits
  --------                       --------            ------
  cpu                            22346m (17%)        23125m (18%)
  memory                         195895318784 (76%)  20100M (7%)

# oc adm top node
NAME       CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)
master-0   1335m        6%       11285Mi         23%
master-1   4938m        25%      16352Mi         33%
master-2   1367m        7%       13858Mi         28%
worker00   2358m        1%       148680Mi        60%
worker01   8426m        6%       154793Mi        63%
worker02   4399m        3%       153805Mi        62%

I tried rebooting the guest OS and the pod state did not change, but interestingly enough a virtctl migrate test on a VM worked and then the pod state was fine and remained in Running:

virt-density                                       virt-launcher-virt-density-313-ncmm7                              3/3     Running     0               56s     10.128.2.238     worker02   <none>           1/1
virt-density                                       virt-launcher-virt-density-313-qtfn2                              0/3     Completed   0               3h27m   10.131.0.132     worker00   <none>           1/1

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

must-gather.tar
57.26 MB
2025/11/25 8:56 AM

Assignee:: Federico Fossemo

Reporter:: Jenifer Abrams

QA Contact:: Denys Shchedrivyi

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2025/09/05 5:26 PM

Updated:: 2025/11/25 10:09 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates