Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54301

Node in NotReady state due to inactive kubelet and crio services, nodeip-configuration service fails in RHOCP4

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      A worker node went to NotReady state.
      The kubelet and CRI-O services were inactive, and the nodeip-configuration service was in a failed state due to syntax error.
      There is already a known issue for OpenShift 4.7 related to this bug (https://bugzilla.redhat.com/show_bug.cgi?id=1894477), but now customer is observed the similar issue in OpenShift 4.15
      This issue is still occurring.
      
      Despite attempts to restart both the kubelet and CRI-O services, they remained stuck. A reboot of the node did not resolve the issue.
      After further investigation, found that executing the script "./usr/local/bin/configure-ip-forwarding.sh" manually led to the kubelet and CRI-O services becoming active, and the node then transitioned into a Ready state.
      
      Upon checking `cat /etc/systemd/system/nodeip-configuration.service`, I noted that the root cause could be linked to the failure of the nodeip-configuration service and potential issues with IP forwarding setup.

      Version-Release number of selected component (if applicable):

      4.15.43

      Actual results:

      The worker node remains in the NotReady state, and the kubelet and CRI-O services are inactive. The nodeip-configuration service fails, and manually executing configure-ip-forwarding.sh resolved the issue.

      Expected results:

      The worker node should automatically go to Ready state without manual intervention, and the kubelet and CRI-O services should be active upon boot without requiring a manual run of configure-ip-forwarding.sh.

      Additional info:

      Node IP configuration service (nodeip-configuration) fails to start, causing kubelet and CRI-O services to be inactive.
      ~~~
      Mar 27 09:34:11 li1vchdcpwrk6p.qnb.bnk systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
      Mar 27 09:34:11 li1vchdcpwrk6p.qnb.bnk bash[14737]: /bin/bash: -c: line 1: syntax error near unexpected token `done'
      Mar 27 09:34:11 li1vchdcpwrk6p.qnb.bnk bash[14737]: /bin/bash: -c: line 1: `    until    /usr/bin/podman run --rm    --authfile /var/lib/kubelet/config.json    --net=host    --security-opt label=disable    --volume /etc/systemd/system:/etc/systemd/system    --volume /run/nodeip-configuration:/run/nodeip-configuration    quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d069765e098835ae1a98164c70425c17c3995c28873bf5e292889a100caf3ea5    node-ip    set    --platform VSphere    --user-managed-lb    --retry-on-failure        do    sleep 5;    done'
      Mar 27 09:34:11 li1vchdcpwrk6p.qnb.bnk systemd[1]: nodeip-configuration.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
      Mar 27 09:34:11 li1vchdcpwrk6p.qnb.bnk systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'.
      Mar 27 09:34:11 li1vchdcpwrk6p.qnb.bnk systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP.
      ~~~
      
      The issue has been noted in OpenShift 4.7 as well (see https://bugzilla.redhat.com/show_bug.cgi?id=1894477).
      
      Executing ./usr/local/bin/configure-ip-forwarding.sh manually resolves the issue temporarily.
      ~~~
      cat etc/systemd/system/nodeip-configuration.service
      sleep 5; \
        done"
      ExecStart=/bin/systemctl daemon-reload
      ExecStartPre=/bin/mkdir -p /run/nodeip-configuration
      ExecStartPost=+/usr/local/bin/configure-ip-forwarding.sh
      StandardOutput=journal+console
      StandardError=journal+console
      ~~
      

              bnemec@redhat.com Benjamin Nemec
              rhn-support-sdharma Suruchi Dharma
              None
              None
              Cameron Meadors Cameron Meadors
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: