Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42061

Failure to pull NTO image preventing startup of ocp-tuned-one-shot.service

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.14.z, 4.15.z, 4.16.z
    • Node Tuning Operator
    • None
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when the Node Tuning Operator (NTO) was configured using PerformanceProfiles it would create an ocp-tuned-one-shot systemd service. The systemd service would run prior to kubelet and blocked execution. The systemd service invokes Podman which uses an NTO image, but when the NTO image was not present Podman still tried to fetch the image and it would fail. With this release, support is added for cluster-wide proxy environment variables defined in `/etc/mco/proxy.env`. Now, Podman pulls NTO images in environments which need to use proxies for out-of-cluster connections. (link:https://issues.redhat.com/browse/OCPBUGS-42061[*OCPBUGS-42061*])
      ______________
      When the Node Tuning Operator (NTO) is configured using PerformanceProfiles, it creates ocp-tuned-one-shot systemd service which runs prior to kubelet and blocks its execution. The systemd service invokes podman which uses NTO image. In case the NTO image is not present, podman tries to fetch the image. This release adds support for cluster-wide proxy environment variables defined in "/etc/mco/proxy.env". This allows podman to pull NTO image in environments which need to use http(s) proxy for out-of-cluster connections.
      Show
      * Previously, when the Node Tuning Operator (NTO) was configured using PerformanceProfiles it would create an ocp-tuned-one-shot systemd service. The systemd service would run prior to kubelet and blocked execution. The systemd service invokes Podman which uses an NTO image, but when the NTO image was not present Podman still tried to fetch the image and it would fail. With this release, support is added for cluster-wide proxy environment variables defined in `/etc/mco/proxy.env`. Now, Podman pulls NTO images in environments which need to use proxies for out-of-cluster connections. (link: https://issues.redhat.com/browse/OCPBUGS-42061 [* OCPBUGS-42061 *]) ______________ When the Node Tuning Operator (NTO) is configured using PerformanceProfiles, it creates ocp-tuned-one-shot systemd service which runs prior to kubelet and blocks its execution. The systemd service invokes podman which uses NTO image. In case the NTO image is not present, podman tries to fetch the image. This release adds support for cluster-wide proxy environment variables defined in "/etc/mco/proxy.env". This allows podman to pull NTO image in environments which need to use http(s) proxy for out-of-cluster connections.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-39124. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-39005. The following is the description of the original issue:

      Hello Team,

       

      After the hard reboot of all nodes due to a power outage,  failure of image pull of NTO preventing "ocp-tuned-one-shot.service" startup result in dependency failure for kubelet and crio services,

      ------------

      journalctl_--no-pager

      Aug 26 17:07:46 ocp05 systemd[1]: Reached target The firstboot OS update has completed.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3577]: NM resolv-prepender: Starting download of baremetal runtime cfg image
      Aug 26 17:07:46 ocp05 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
      Aug 26 17:07:46 ocp05 systemd[1]: Starting TuneD service from NTO image...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3687]: NM resolv-prepender triggered by lo up.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3644]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ lo == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + exit 0
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + exit 0
      Aug 26 17:07:46 ocp05 bash[3655]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 podman[3661]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26...
      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Main process exited, code=exited, status=125/n/a
      Aug 26 17:07:46 ocp05 nm-dispatcher[3793]: NM resolv-prepender triggered by brtrunk up.
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ brtrunk == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + exit 0
      Aug 26 17:07:46 ocp05 systemd[1]: Failed to start TuneD service from NTO image.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Dependencies necessary to run kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Kubernetes Kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Container Runtime Interface for OCI (CRI-O).
      Aug 26 17:07:46 ocp05 systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet-dependencies.target: Job kubelet-dependencies.target/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + exit 0

      -----------

      -----------

      $ oc get proxy config cluster  -oyaml
        status:
          httpProxy: http://proxy_ip:8080
          httpsProxy: http://proxy_ip:8080

      $ cat /etc/mco/proxy.env
      HTTP_PROXY=http://proxy_ip:8080
      HTTPS_PROXY=http://proxy_ip:8080

      -----------

      -----------
      × ocp-tuned-one-shot.service - TuneD service from NTO image
           Loaded: loaded (/etc/systemd/system/ocp-tuned-one-shot.service; enabled; preset: disabled)
           Active: failed (Result: exit-code) since Mon 2024-08-26 17:07:46 UTC; 2h 30min ago
         Main PID: 3661 (code=exited, status=125)

      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      -----------

      • Customer has proxy configured in their environment. However,  nodes can not start after hard reboot of all nodes as it looks that NTO ignoring cluster wide proxy settings. To resolve NTO image pull issue, customer has to include proxy variable in  /etc/systemd/system.conf manually.

            jmencak Jiri Mencak
            openshift-crt-jira-prow OpenShift Prow Bot
            Liquan Cui Liquan Cui
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: