Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 3.11.z
Component/s: Node / Kubelet
Labels:
- triaged

Severity:
Critical
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Links:

Description

Description of problem:

One of my customer have been facing an issue on 2 different nodes which looks quite similar.
The OpenShift nodes were marked as Ready although they were never marked as NotReady:

###
Jan 19 23:35:44 ocpxxxx.local atomic-openshift-node[4090]: I0119 23:35:44.142960    4090 kubelet_node_status.go:441] Recording NodeReady event message for node ocpxxxx.local
###
Jan 19 07:48:01 ocpxxxx.local atomic-openshift-node[195489]: I0119 07:48:01.713226  195489 kubelet_node_status.go:441] Recording NodeReady event message for node ocpxxxx.local
###

The OpenShift nodes were never marked as NotReady so the pods were not rescheduled but the Endpoints were marked as NotReady:

###
[RED3QY9@popshba01c ~]$ oc get ep -n xxx genesysxxxx -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  creationTimestamp: 2022-03-11T12:57:34Z
  labels:
    app.kubernetes.io/instance: genesys-xxxx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: genesys-xxxx
    app.kubernetes.io/version: 1.0.0
    helm.sh/chart: genesys-xxxx
  name: genesys-services-gestione-esiti-ricontatto
  namespace: xxxx
  resourceVersion: "675954227"
  selfLink: /api/v1/namespaces/xxx/endpoints/genesys-sxxxx
  uid: d1e2f763-a13a-11ec-b2c6-5448107ce401
subsets:
- notReadyAddresses:
  - ip: 10.xxxx
    nodeName: ocpxxxx.local
    targetRef:
      kind: Pod
      name: genesys-xxxx
      namespace: xxxx
      resourceVersion: "675954219"
      uid: c4e5b7c8-8223-11ed-bcbe-5448107ce41b
  ports:
  - name: 8443-tcp
    port: 8443
    protocol: TCP
###

This looks similar to the bug https://bugzilla.redhat.com/show_bug.cgi?id=1814804.
Customers would like to know if the mitigation submitted here it's a valid mitigation which can avoid restarting all pods running on that OpenShift nodes as described here [1].


OpenShift version: 3.11.153
RHEL version: 7.7

They have ELS subscription.

Thanks
Angelo

[1] https://access.redhat.com/solutions/5002781

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Attachments

Activity

People

Assignee:: Neelesh Agrawal

Reporter:: Angelo Gabrieli

QA Contact:: Sunil Choudhary

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2023/01/20 3:24 PM

Updated:: 2023/02/09 2:40 PM

Resolved:: 2023/02/09 2:40 PM