Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63450

kubelet fails to start on master node during upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.18.0
    • 4.18.z
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • contract-priority
    • Done
    • Bug Fix
    • Hide
      Before this update, when you ran the`ocp-tuned-one-shot.service` systemd unit that was owned by the Node Tuning Operator (NTO), a dependency failure might have occurred for the kubelet. As a consequence, the kubelet did not start. With this release, running the` ocp-tuned-one-shot.service` unit does not cause a dependency failure. As a result, the kubelet starts when you run the unit. (link:https://issues.redhat.com/browse/OCPBUGS-63450[OCPBUGS-63450])
      Show
      Before this update, when you ran the`ocp-tuned-one-shot.service` systemd unit that was owned by the Node Tuning Operator (NTO), a dependency failure might have occurred for the kubelet. As a consequence, the kubelet did not start. With this release, running the` ocp-tuned-one-shot.service` unit does not cause a dependency failure. As a result, the kubelet starts when you run the unit. (link: https://issues.redhat.com/browse/OCPBUGS-63450 [ OCPBUGS-63450 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-63334. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-62940. The following is the description of the original issue:

      Environment:
      Baremetal MNO + ODF
      Description of problem:

      customer is testing env upgrades:

      • from 4.16.44 > 4.17.37 > 4.18.22 they found no issues at all
      • from 4.16.44 > 4.17.37 > 4.18.24 they're experiencing problems 50% of times related to kubelet failing to starts on a master node, possibly due to  ocp-tuned-one-shot.service repeatedly failed 

       

      Oct 03 11:45:04 master2 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
      Oct 03 11:45:04 master2 systemd[1]: Failed to start TuneD service from NTO image. 

      This looks close to the scenario described in KCS#7128296  (unfortunately without any linked case or BUG).

      How reproducible:

      50% of time    

      Steps to Reproduce:

      upgrade from 4.16.44 > 4.17.37 > 4.18.24      

      Actual results:

      50% of times, kubelet fails to start on master nodes       

              jmencak Jiri Mencak
              rh-ee-fpiccion Flavio Piccioni
              None
              None
              Liquan Cui Liquan Cui
              Lluis Cavalle Lluis Cavalle
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: