Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39005

Failure to pull NTO image preventing startup of ocp-tuned-one-shot.service

XMLWordPrintable

    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • Hide
      When the Node Tuning Operator (NTO) is configured using PerformanceProfiles, it creates ocp-tuned-one-shot systemd service which runs prior to kubelet and blocks its execution. The systemd service invokes podman which uses NTO image. In case the NTO image is not present, podman tries to fetch the image. This release adds support for cluster-wide proxy environment variables defined in "/etc/mco/proxy.env". This allows podman to pull NTO image in environments which need to use http(s) proxy for out-of-cluster connections.
      Show
      When the Node Tuning Operator (NTO) is configured using PerformanceProfiles, it creates ocp-tuned-one-shot systemd service which runs prior to kubelet and blocks its execution. The systemd service invokes podman which uses NTO image. In case the NTO image is not present, podman tries to fetch the image. This release adds support for cluster-wide proxy environment variables defined in "/etc/mco/proxy.env". This allows podman to pull NTO image in environments which need to use http(s) proxy for out-of-cluster connections.
    • Bug Fix
    • In Progress

      Hello Team,

       

      After the hard reboot of all nodes due to a power outage,  failure of image pull of NTO preventing "ocp-tuned-one-shot.service" startup result in dependency failure for kubelet and crio services,

      ------------

      journalctl_--no-pager

      Aug 26 17:07:46 ocp05 systemd[1]: Reached target The firstboot OS update has completed.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3577]: NM resolv-prepender: Starting download of baremetal runtime cfg image
      Aug 26 17:07:46 ocp05 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
      Aug 26 17:07:46 ocp05 systemd[1]: Starting TuneD service from NTO image...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3687]: NM resolv-prepender triggered by lo up.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3644]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ lo == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + exit 0
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + exit 0
      Aug 26 17:07:46 ocp05 bash[3655]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 podman[3661]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26...
      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Main process exited, code=exited, status=125/n/a
      Aug 26 17:07:46 ocp05 nm-dispatcher[3793]: NM resolv-prepender triggered by brtrunk up.
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ brtrunk == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + exit 0
      Aug 26 17:07:46 ocp05 systemd[1]: Failed to start TuneD service from NTO image.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Dependencies necessary to run kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Kubernetes Kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Container Runtime Interface for OCI (CRI-O).
      Aug 26 17:07:46 ocp05 systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet-dependencies.target: Job kubelet-dependencies.target/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + exit 0

      -----------

      -----------

      $ oc get proxy config cluster  -oyaml
        status:
          httpProxy: http://proxy_ip:8080
          httpsProxy: http://proxy_ip:8080

      $ cat /etc/mco/proxy.env
      HTTP_PROXY=http://proxy_ip:8080
      HTTPS_PROXY=http://proxy_ip:8080

      -----------

      -----------
      × ocp-tuned-one-shot.service - TuneD service from NTO image
           Loaded: loaded (/etc/systemd/system/ocp-tuned-one-shot.service; enabled; preset: disabled)
           Active: failed (Result: exit-code) since Mon 2024-08-26 17:07:46 UTC; 2h 30min ago
         Main PID: 3661 (code=exited, status=125)

      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      -----------

      • Customer has proxy configured in their environment. However,  nodes can not start after hard reboot of all nodes as it looks that NTO ignoring cluster wide proxy settings. To resolve NTO image pull issue, customer has to include proxy variable in  /etc/systemd/system.conf manually.

            jmencak Jiri Mencak
            rhn-support-dgupte Dhananjay Gupte
            Liquan Cui Liquan Cui
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: