Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33959

gather_network_logs_basics script when node is in the NotReady

XMLWordPrintable

    • No
    • SDN Sprint 254, SDN Sprint 255, SDN Sprint 256, SDN Sprint 257, SDN Sprint 258, SDN Sprint 259
    • 6
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when you used the `must-gather` tool, a Multus Container Network Interface (CNI) log file, `multus.log`, was stored in a node's file system. This situation caused the tool to generate unnecessary debug pods in a node. With this release, the Multus CNI no longer creates a `multus.log` file, and instead uses a CNI plugin pattern to inspect any logs for Multus DaemonSet pods in the `openshift-multus` namespace. (link:https://issues.redhat.com/browse/OCPBUGS-33959[*OCPBUGS-33959*])
      Show
      * Previously, when you used the `must-gather` tool, a Multus Container Network Interface (CNI) log file, `multus.log`, was stored in a node's file system. This situation caused the tool to generate unnecessary debug pods in a node. With this release, the Multus CNI no longer creates a `multus.log` file, and instead uses a CNI plugin pattern to inspect any logs for Multus DaemonSet pods in the `openshift-multus` namespace. (link: https://issues.redhat.com/browse/OCPBUGS-33959 [* OCPBUGS-33959 *])
    • Bug Fix
    • Done
    • Troubleshoot

      This comes from this bug https://issues.redhat.com/browse/OCPBUGS-29940

      After applying the workaround suggested [1][2] with "oc adm must-gather --node-name" we found another issue where must-gather creates the debug pod on all master nodes and gets stuck for a while because the script gather_network_logs_basics loop. Filtering out the NotReady nodes would allow us to apply the workaround.

      The script gather_network_logs_basics gets the master nodes by label (node-role.kubernetes.io/master) and saves them in the CLUSTER_NODES variable. It then passes this as a parameter to the function gather_multus_logs $CLUSTER_NODES, where it loops through the list of master nodes and performs debugging for each node.

      collection-scripts/gather_network_logs_basics
      ...
      CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -oname)}"
      /usr/bin/gather_multus_logs $CLUSTER_NODES
      ...
      
      collection-scripts/gather_multus_logs
      ...
      function gather_multus_logs {
        for NODE in "$@"; do
          nodefilename=$(echo "$NODE" | sed -e 's|node/||')
          out=$(oc debug "${NODE}" -- \
          /bin/bash -c "cat $INPUT_LOG_PATH" 2>/dev/null) && echo "$out" 1> "${OUTPUT_LOG_PATH}/multus-log-$nodefilename.log"
        done
      }
      

      This could be resolved with something similar to this:

      CLUSTER_NODES="${@:-$(oc get node -l node-role.kubernetes.io/master -o json | jq -r '.items[] | select(.status.conditions[] | select(.type=="Ready" and .status=="True")).metadata.name')}"
      /usr/bin/gather_multus_logs $CLUSTER_NODES

      [1] - https://access.redhat.com/solutions/6962230
      [2] - https://issues.redhat.com/browse/OCPBUGS-29940

              dosmith Douglas Smith
              rhn-support-jclaretm Jorge Claret Membrado
              Ross Brattain Ross Brattain
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: