-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.14
-
No
-
SDN Sprint 254, SDN Sprint 255, SDN Sprint 256, SDN Sprint 257, SDN Sprint 258, SDN Sprint 259
-
6
-
False
-
-
-
Bug Fix
-
Done
-
Troubleshoot
-
This comes from this bug https://issues.redhat.com/browse/OCPBUGS-29940
After applying the workaround suggested [1][2] with "oc adm must-gather --node-name" we found another issue where must-gather creates the debug pod on all master nodes and gets stuck for a while because the script gather_network_logs_basics loop. Filtering out the NotReady nodes would allow us to apply the workaround.
The script gather_network_logs_basics gets the master nodes by label (node-role.kubernetes.io/master) and saves them in the CLUSTER_NODES variable. It then passes this as a parameter to the function gather_multus_logs $CLUSTER_NODES, where it loops through the list of master nodes and performs debugging for each node.
collection-scripts/gather_network_logs_basics
...
CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -oname)}"
/usr/bin/gather_multus_logs $CLUSTER_NODES
...
collection-scripts/gather_multus_logs ... function gather_multus_logs { for NODE in "$@"; do nodefilename=$(echo "$NODE" | sed -e 's|node/||') out=$(oc debug "${NODE}" -- \ /bin/bash -c "cat $INPUT_LOG_PATH" 2>/dev/null) && echo "$out" 1> "${OUTPUT_LOG_PATH}/multus-log-$nodefilename.log" done }
This could be resolved with something similar to this:
CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -o json | jq -r '.items[] | select(.status.conditions[] | select(.type=="Ready" and .status=="True")).metadata.name')}"
/usr/bin/gather_multus_logs $CLUSTER_NODES
[1] - https://access.redhat.com/solutions/6962230
[2] - https://issues.redhat.com/browse/OCPBUGS-29940
- is cloned by
-
OCPBUGS-42835 gather_network_logs_basics script when node is in the NotReady
- Closed
-
OCPBUGS-43055 gather_network_logs_basics script when node is in the NotReady [backport 4.17]
- Closed
- is depended on by
-
OCPBUGS-42835 gather_network_logs_basics script when node is in the NotReady
- Closed
-
OCPBUGS-43055 gather_network_logs_basics script when node is in the NotReady [backport 4.17]
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update