Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63450

kubelet fails to start on master node during upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.z
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • contract-priority
    • In Progress
    • Bug Fix
    • Hide
      NTO-owned systemd ocp-tuned-one-shot.service unit runs and blocks kubelet.
      This service is important for reducing the number of Inter-processor
      interrupts interfering with low-latency workloads. It was possible for
      the ocp-tuned-one-shot.service to cause dependency failure for kubelet
      and prevent it from running. The fix makes it impossible to for the
      ocp-tuned-one-shot.service to fail.
      Show
      NTO-owned systemd ocp-tuned-one-shot.service unit runs and blocks kubelet. This service is important for reducing the number of Inter-processor interrupts interfering with low-latency workloads. It was possible for the ocp-tuned-one-shot.service to cause dependency failure for kubelet and prevent it from running. The fix makes it impossible to for the ocp-tuned-one-shot.service to fail.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-63334. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-62940. The following is the description of the original issue:

      Environment:
      Baremetal MNO + ODF
      Description of problem:

      customer is testing env upgrades:

      • from 4.16.44 > 4.17.37 > 4.18.22 they found no issues at all
      • from 4.16.44 > 4.17.37 > 4.18.24 they're experiencing problems 50% of times related to kubelet failing to starts on a master node, possibly due to  ocp-tuned-one-shot.service repeatedly failed 

       

      Oct 03 11:45:04 master2 systemd[1]: ocp-tuned-one-shot.service: Failed with result 'exit-code'.
      Oct 03 11:45:04 master2 systemd[1]: Failed to start TuneD service from NTO image. 

      This looks close to the scenario described in KCS#7128296  (unfortunately without any linked case or BUG).

      How reproducible:

      50% of time    

      Steps to Reproduce:

      upgrade from 4.16.44 > 4.17.37 > 4.18.24      

      Actual results:

      50% of times, kubelet fails to start on master nodes       

              jmencak Jiri Mencak
              rh-ee-fpiccion Flavio Piccioni
              None
              None
              Liquan Cui Liquan Cui
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: