Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39124

Failure to pull NTO image preventing startup of ocp-tuned-one-shot.service

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.14.z, 4.15.z, 4.16.z
    • Node Tuning Operator
    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when the Node Tuning Operator (NTO) was configured using PerformanceProfiles, it created an `ocp-tuned-one-shot systemd` service which ran prior to kubelet and blocked NTO execution. This prevented Podman from fetching the image. With this release, support for cluster-wide proxy environment variables defined in `/etc/mco/proxy.env` is available. This allows Podman to pull NTO images into environments that need to use the http(s) proxy for out-of-cluster connections. (https://issues.redhat.com/browse/OCPBUGS-39124[*OCPBUGS-39124*])
      Show
      * Previously, when the Node Tuning Operator (NTO) was configured using PerformanceProfiles, it created an `ocp-tuned-one-shot systemd` service which ran prior to kubelet and blocked NTO execution. This prevented Podman from fetching the image. With this release, support for cluster-wide proxy environment variables defined in `/etc/mco/proxy.env` is available. This allows Podman to pull NTO images into environments that need to use the http(s) proxy for out-of-cluster connections. ( https://issues.redhat.com/browse/OCPBUGS-39124 [* OCPBUGS-39124 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-39005. The following is the description of the original issue:

      Hello Team,

       

      After the hard reboot of all nodes due to a power outage,  failure of image pull of NTO preventing "ocp-tuned-one-shot.service" startup result in dependency failure for kubelet and crio services,

      ------------

      journalctl_--no-pager

      Aug 26 17:07:46 ocp05 systemd[1]: Reached target The firstboot OS update has completed.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3577]: NM resolv-prepender: Starting download of baremetal runtime cfg image
      Aug 26 17:07:46 ocp05 systemd[1]: Starting Writes IP address configuration so that kubelet and crio services select a valid node IP...
      Aug 26 17:07:46 ocp05 systemd[1]: Starting TuneD service from NTO image...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3687]: NM resolv-prepender triggered by lo up.
      Aug 26 17:07:46 ocp05 resolv-prepender.sh[3644]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + [[ lo == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3720]: + exit 0
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3722]: + exit 0
      Aug 26 17:07:46 ocp05 bash[3655]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cf4faeb258c222ba4e04806fd3a7373d3bc1f43a66e141d4b7ece0307f597c72...
      Aug 26 17:07:46 ocp05 podman[3661]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26...
      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Main process exited, code=exited, status=125/n/a
      Aug 26 17:07:46 ocp05 nm-dispatcher[3793]: NM resolv-prepender triggered by brtrunk up.
      Aug 26 17:07:46 ocp05 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + [[ brtrunk == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + '[' -z ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + echo 'Not a DHCP4 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: Not a DHCP4 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3803]: + exit 0
      Aug 26 17:07:46 ocp05 systemd[1]: Failed to start TuneD service from NTO image.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Dependencies necessary to run kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Kubernetes Kubelet.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet.service: Job kubelet.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: Dependency failed for Container Runtime Interface for OCI (CRI-O).
      Aug 26 17:07:46 ocp05 systemd[1]: crio.service: Job crio.service/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 systemd[1]: kubelet-dependencies.target: Job kubelet-dependencies.target/start failed with result 'dependency'.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + '[' -z '' ']'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + echo 'Not a DHCP6 address. Ignoring.'
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: Not a DHCP6 address. Ignoring.
      Aug 26 17:07:46 ocp05 nm-dispatcher[3804]: + exit 0

      -----------

      -----------

      $ oc get proxy config cluster  -oyaml
        status:
          httpProxy: http://proxy_ip:8080
          httpsProxy: http://proxy_ip:8080

      $ cat /etc/mco/proxy.env
      HTTP_PROXY=http://proxy_ip:8080
      HTTPS_PROXY=http://proxy_ip:8080

      -----------

      -----------
      × ocp-tuned-one-shot.service - TuneD service from NTO image
           Loaded: loaded (/etc/systemd/system/ocp-tuned-one-shot.service; enabled; preset: disabled)
           Active: failed (Result: exit-code) since Mon 2024-08-26 17:07:46 UTC; 2h 30min ago
         Main PID: 3661 (code=exited, status=125)

      Aug 26 17:07:46 ocp05 podman[3661]: Error: initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b6ace44ba73bc0cef451bcf755c7fcddabe66b79df649058dc4b263e052ae26: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 10.112.227.10:53: server misbehaving
      -----------

      • Customer has proxy configured in their environment. However,  nodes can not start after hard reboot of all nodes as it looks that NTO ignoring cluster wide proxy settings. To resolve NTO image pull issue, customer has to include proxy variable in  /etc/systemd/system.conf manually.

              jmencak Jiri Mencak
              openshift-crt-jira-prow OpenShift Prow Bot
              Liquan Cui Liquan Cui
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: