Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- blue
- kubelet

Work Type:
Upstream
Story Points:
5
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
Enable Evented PLEG via TechPreview
Feature Link:
OCPSTRAT-296 - Openshift Kubelet: Pod Lifecycle Event Generator (PLEG)
Intelligence Requested:
Market:

Sprint:
OCPNODE Sprint 243 (Blue)

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Enabling the evented pleg featuregate via the machine config operator is resulting in the pods going into "CrashLoopBackOff" or "Error" state.
MCO Branch: https://github.com/openshift/machine-config-operator/pull/3917/files

Reason for the pods going into the CrashLoopBackOff state is that there are duplicate containers getting created, started within the pod and hence racing out for acquiring the resources (ports).

Ex: "bind: address already in use" error observed on many pods.

ci-ln-09tlpi2-72292-flgxf-master-2.log:264184:Sep 18 15:39:10.886241 ci-ln-09tlpi2-72292-flgxf-master-2 kubenswrapper[2308]:         time="2023-09-18T15:39:01Z" level=fatal msg="failed to create listener: failed to listen on 0.0.0.0:5443: listen tcp 0.0.0.0:5443: bind: address already in use"
ci-ln-09tlpi2-72292-flgxf-master-2.log:264236:Sep 18 15:39:11.154052 ci-ln-09tlpi2-72292-flgxf-master-2 kubenswrapper[2308]:         F0918 15:38:21.629025       1 cmd.go:56] failed to create listener: failed to listen on 0.0.0.0:6443: listen tcp 0.0.0.0:6443: bind: address already in use
ci-ln-09tlpi2-72292-flgxf-master-2.log:265167:Sep 18 15:39:21.155012 ci-ln-09tlpi2-72292-flgxf-master-2 kubenswrapper[2308]:         F0918 15:38:23.445192       1 standalone_apiserver.go:120] listen tcp 0.0.0.0:8443: bind: address already in use
ci-ln-09tlpi2-72292-flgxf-master-2.log:265182:Sep 18 15:39:21.155012 ci-ln-09tlpi2-72292-flgxf-master-2 kubenswrapper[2308]:         E0918 15:38:54.680976       1 run.go:74] "command failed" err="failed to run groups: failed to listen on secure address: listen tcp :8443: bind: address already in use"

The above issue has been identified and root caused - https://issues.redhat.com/browse/OCPNODE-1818

Fix this issue along with fixing the flakiness of the job - https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv1-evented-pleg

Test PR with a potential fix - https://github.com/kubernetes/kubernetes/pull/120480

clones

OCPNODE-1818 Debug the failures while enabling EventedPLEG featuregate in Kubelet

Closed

is cloned by

OCPNODE-1872 [UPSTREAM] Fix flaky node e2e tests when evented pleg is enabled

Closed

Assignee:: Sai Ramesh Vanka

Reporter:: Sai Ramesh Vanka

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2023/09/28 12:59 PM

Updated:: 2024/07/30 1:02 AM

Resolved:: 2023/10/19 9:54 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates