Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39005

Failure to pull NTO image preventing startup of ocp-tuned-one-shot.service

XMLWordPrintable

    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, configuring the Node Tuning Operator (NTO) using PerformanceProfiles creates the ocp-tuned-one-shot systemd service, which runs prior to kubelet and blocks its execution. The systemd service invokes podman which uses the NTO image. When the NTO image is not present, podman tries to fetch the image. This release adds support for cluster-wide proxy environment variables defined in "/etc/mco/proxy.env". and allows podman to pull the NTO image in environments which need to use http(s) proxy for out-of-cluster connections. (link:https://issues.redhat.com/browse/OCPBUGS-39005[*OCPBUGS-39005*])
      Show
      Previously, configuring the Node Tuning Operator (NTO) using PerformanceProfiles creates the ocp-tuned-one-shot systemd service, which runs prior to kubelet and blocks its execution. The systemd service invokes podman which uses the NTO image. When the NTO image is not present, podman tries to fetch the image. This release adds support for cluster-wide proxy environment variables defined in "/etc/mco/proxy.env". and allows podman to pull the NTO image in environments which need to use http(s) proxy for out-of-cluster connections. (link: https://issues.redhat.com/browse/OCPBUGS-39005 [* OCPBUGS-39005 *])
    • Bug Fix
    • Done

      Hello Team,

       

      After the hard reboot of all nodes due to a power outage,  failure of image pull of NTO preventing "ocp-tuned-one-shot.service" startup result in dependency failure for kubelet and crio services,

      ------------

      journalctl_--no-pager

      Aug 26 17:07:46 ocp05 systemd[1]: Reached target The firstboot OS update has completed.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3577]: NM resolv-prepender: Starting download of baremetal runtime cfg image
      Aug 26 17:07:46 ocp05 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
      Aug 26 17:07:46 ocp05 systemd[1]: Starting TuneD service from NTO image...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3687]: NM resolv-prepender triggered by lo up.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3644]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ lo == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + exit 0
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + exit 0
      Aug 26 17:07:46 ocp05 bash[3655]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 podman[3661]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26...
      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Main process exited, code=exited, status=125/n/a
      Aug 26 17:07:46 ocp05 nm-dispatcher[3793]: NM resolv-prepender triggered by brtrunk up.
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ brtrunk == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + exit 0
      Aug 26 17:07:46 ocp05 systemd[1]: Failed to start TuneD service from NTO image.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Dependencies necessary to run kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Kubernetes Kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Container Runtime Interface for OCI (CRI-O).
      Aug 26 17:07:46 ocp05 systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet-dependencies.target: Job kubelet-dependencies.target/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + exit 0

      -----------

      -----------

      $ oc get proxy config cluster  -oyaml
        status:
          httpProxy: http://proxy_ip:8080
          httpsProxy: http://proxy_ip:8080

      $ cat /etc/mco/proxy.env
      HTTP_PROXY=http://proxy_ip:8080
      HTTPS_PROXY=http://proxy_ip:8080

      -----------

      -----------
      × ocp-tuned-one-shot.service - TuneD service from NTO image
           Loaded: loaded (/etc/systemd/system/ocp-tuned-one-shot.service; enabled; preset: disabled)
           Active: failed (Result: exit-code) since Mon 2024-08-26 17:07:46 UTC; 2h 30min ago
         Main PID: 3661 (code=exited, status=125)

      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      -----------

      • Customer has proxy configured in their environment. However,  nodes can not start after hard reboot of all nodes as it looks that NTO ignoring cluster wide proxy settings. To resolve NTO image pull issue, customer has to include proxy variable in  /etc/systemd/system.conf manually.

              jmencak Jiri Mencak
              rhn-support-dgupte Dhananjay Gupte
              Liquan Cui Liquan Cui
              Padraig OGrady Padraig OGrady
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: