Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: oc
Labels:
- must-gather
- oc

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Intelligence Requested:
Market:
PX Impact Score:
Portfolio Solutions:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

RFE requested based on the OCPBUGS-29940 discussion

Description of problem:

Despite one node being in a NotReady state, executing oc adm must-gather results in the must-gather pods being scheduled on the 'NotReady' node. There's a toleration operator: Exists which means - An empty key with operator Exists matches all keys, values and effects which means this will tolerate everything

$ oc get pod must-gather-nmhwq -n openshift-must-gather-9jtzz -o yaml | grep -i tolerations -A1
  tolerations:
  - operator: Exists

How reproducible:

Increasing the loglevels

$ oc patch kubeschedulers.operator/cluster --type=json -p '[{"op": "replace", "path": "/spec/logLevel", "value": "TraceAll" }]'

Stopping kubelet and leave the node in NotReady with taints NoExecute/NoSchedule

$ ssh core@mno-ctlplane-0.5g-deployment.lab
$ sudo systemctl stop kubelet

$ oc get nodes
NAME                               STATUS     ROLES                         AGE     VERSION
mno-ctlplane-0.5g-deployment.lab   NotReady   control-plane,master,worker   7h27m   v1.27.8+4fab27b
mno-ctlplane-1.5g-deployment.lab   Ready      control-plane,master,worker   7h27m   v1.27.8+4fab27b
mno-ctlplane-2.5g-deployment.lab   Ready      control-plane,master,worker   7h27m   v1.27.8+4fab27b
mno-worker-0.5g-deployment.lab     Ready      worker                        7h1m    v1.27.8+4fab27b
mno-worker-1.5g-deployment.lab     Ready      worker                        7h1m    v1.27.8+4fab27b

$ oc describe node mno-ctlplane-0.5g-deployment.lab | grep -A2 -i taint
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule

Run must-gather command which was previously executed and created the must-gather pod on the master-0 node, now in pending

$ oc adm must-gather 
$ oc get pod -A -o wide|grep must
openshift-must-gather-qzctk               must-gather-jzlt6          0/2     Pending       0                15s     <none>            mno-ctlplane-0.5g-deployment.lab   <none>           <none>

Logs with the nodes elegibles and the score

$ oc logs openshift-kube-scheduler-mno-ctlplane-1.5g-deployment.lab -f -n openshift-kube-scheduler| grep must
...
I0222 17:44:52.501842       1 schedule_one.go:748] "Calculated node's final score for pod" pod="openshift-must-gather-qzctk/must-gather-jzlt6" node="mno-ctlplane-0.5g-deployment.lab" score=637
I0222 17:44:52.501849       1 schedule_one.go:748] "Calculated node's final score for pod" pod="openshift-must-gather-qzctk/must-gather-jzlt6" node="mno-ctlplane-1.5g-deployment.lab" score=614
I0222 17:44:52.501856       1 schedule_one.go:748] "Calculated node's final score for pod" pod="openshift-must-gather-qzctk/must-gather-jzlt6" node="mno-ctlplane-2.5g-deployment.lab" score=613
...
$ oc get pod must-gather-ncv8q -n openshift-must-gather-f5mtl -o yaml | grep -i tolerations -A1
  tolerations:
  - operator: Exists

As per the OCPBUGS-29940 discussion the must-gather pod's toleration to tolerate any taint is intended. However, from a customer point of view this behaviour is not expected and the troubleshooting tools should be scheduled on a ready node by default. Introducing a new flag to change this behaviour based on custom taints/tolertation will help to address this.

is related to

RFE-6505 [RFE] Enhance must-gather Tool for Full Automation in OpenShift Environments

Refinement

links to

Support case 03743031

Assignee:: Gaurav Singh

Reporter:: Jorge Claret Membrado

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/06/03 10:50 PM

Updated:: 2025/03/04 9:25 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide