Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56442

OCP 4.18+ | Node Tuning Operator is marked as degraded during IPI wait-for-install process

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.18.z, 4.19.z, 4.20
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Important
    • None
    • None
    • Rejected
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      OCP 4.18+ | Node Tuning Operator is marked as degraded during IPI wait-for-install process    

      Version-Release number of selected component (if applicable):

      Appeared in latest releases from OCP 4.18/19/20; mostly nightly builds, but we've seen this in a RC (4.19.0 rc.2) and a GA (4.18.14) releases.

      How reproducible:

      The issue is not appearing all the times

      Steps to Reproduce:

      1. Deploy OCP using IPI installer in baremetal nodes (3 master and 4 worker nodes)
      2. Wait for bootstrap check passes (using /usr/local/bin/openshift-install --dir /home/kni/clusterconfigs --log-level debug wait-for bootstrap-complete)
      3. However, wait for install check (using /usr/local/bin/openshift-install --dir /home/kni/clusterconfigs --log-level debug wait-for install-complete) didn't pass and the cluster reports issues related to Node Tuning Operator degradation.

      Actual results:

      We can see different issues related to NTO, depending on the case. The issues are related to the default profile that is installed when creating the operator (no custom Tuned profile has been applied).
      
      We have captured these two cases:
      
      1) The NTO keeps forever in progressing status, waiting for 1/7 profiles to be applied. This log message is printed from openshift-install's output:
      
      level=debug msg=Cluster Operator node-tuning is Progressing=True LastTransitionTime=2025-05-15 00:02:53 -0500 CDT DurationSinceTransition=2334s Reason=ProfileProgressing Message=Waiting for 1/7 Profiles to be applied
      
      2) In other cases, the NTO directly reports that is in degraded status, saying that there are some profiles with bootcmdline conflict (which reminds to OCPBUGS-47729):
      
      level=error msg=Cluster operator node-tuning Degraded is True with ProfileConflict: 2/7 Profiles with bootcmdline conflict

      Expected results:

      NTO should be installed correctly and without problems.

      Additional info:

      We have launched all deployments with Distributed-CI (DCI). Here we have the jobs where this issue has been observed, for each of the cases reported above:
      
      1) NTO in progressing status - appeared in 4.18-19
      
      - OpenShift 4.18 nightly 2025-05-14 07:59 - https://www.distributed-ci.io/jobs/8b584162-a714-48fa-b548-b6778c85373a/jobStates?sort=date&task=723dfec0-b79a-4152-9920-83ba336ec4ed
      - OpenShift 4.18 nightly 2025-05-15 13:07 - https://www.distributed-ci.io/jobs/ed91b74d-f60a-4289-a40e-57eedac7a5f1/jobStates?sort=date&task=6c6d40b0-260c-4cbc-9f36-42ec034c1517
      - OpenShift 4.19.0 rc.2 - https://www.distributed-ci.io/jobs/83f3586f-4439-4f46-80c1-8bb05ae18537/jobStates?sort=date&task=54b1d013-17cf-449f-ac43-57757a18c579
      - OpenShift 4.18.14 - https://www.distributed-ci.io/jobs/bd4b6e8f-b694-4056-ba66-16b817a359e5/jobStates?sort=date&task=ca45c507-7e5b-45c9-b4fc-82f70962a9fa
      
      2) NTO in degraded status with bootcmdline conflict issue - appeared in 4.19-20
      
      [must-gather available] - OpenShift 4.20 nightly 2025-05-15 18:06 - https://www.distributed-ci.io/jobs/698236c0-6622-453a-b399-df446478daff/jobStates?sort=date&task=8a5c4a85-6021-4c1f-a87f-3ceeb369e80e
      - OpenShift 4.19 nightly 2025-05-18 11:00 - https://www.distributed-ci.io/jobs/16049a57-2544-4e2d-b4d6-5dcd65f2d608/jobStates?sort=date&task=813853e1-129c-4c7b-85be-4c28087d75e1
      [must-gather available] - OpenShift 4.20 nightly 2025-05-18 20:09 - https://www.distributed-ci.io/jobs/0a475951-5a7f-40d6-a78c-f3fc44fb9462/jobStates?sort=date&task=8b4c070b-f7eb-4a0f-a555-660cbd11a370

              jmencak Jiri Mencak
              raperez@redhat.com Ramon Perez
              None
              None
              Liquan Cui Liquan Cui
              None
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

                Created:
                Updated: