Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49894

In OCL. Disabling OCL process is not working in clusters with proxy

    • Important
    • No
    • 3
    • MCO Sprint 266, MCO Sprint 267, MCO Sprint 268, MCO Sprint 269
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In a cluster using a proxy, when we enable OCL in a MCP and then we disable it, the result is that nodes cannot join the cluster after the reboot.
          

      Version-Release number of selected component (if applicable):

      4.18.0-0.nightly-arm64-2025-02-05-055102
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Create a MOSC resource to enable OCL in the worker pool
          2. Wait for the osImage to be created and applied to all worker nodes.
          3. Remove the MOSC resource to disable OCL
          

      Actual results:

      Nodes are rebooted, and after the reboot they cannot join the cluster and remain in NotReady status
      
      $ oc get nodes
      NAME                          STATUS                        ROLES                  AGE     VERSION
      ip-10-0-50-26.ec2.internal    Ready                         control-plane,master   3h20m   v1.31.4
      ip-10-0-54-149.ec2.internal   NotReady,SchedulingDisabled   worker                 3h11m   v1.31.4
      ip-10-0-75-192.ec2.internal   Ready                         worker                 3h11m   v1.31.4
      ip-10-0-78-138.ec2.internal   Ready                         control-plane,master   3h20m   v1.31.4
      ip-10-0-88-130.ec2.internal   Ready                         control-plane,master   3h20m   v1.31.4
      
      
      If we access the broken nodes via ssh, we can see that the machine-config-daemon-revert.service service is failing.
      
      [core@ip-10-0-49-2 ~]$ journalctl  -u machine-config-daemon-revert.service
      Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: machine-config-daemon-revert.service: Failed to load environment files: No such file or directory
      Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: machine-config-daemon-revert.service: Failed to run 'start-pre' task: No such file or directory
      Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: machine-config-daemon-revert.service: Failed with result 'resources'.
      Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: Failed to start Machine Config Daemon Revert.
      [core@ip-10-0-49-2 ~]$ ls /etc/mco/
      
      The reason is that the file /etc/mco/proxy.env file does not exist in the node.
      
      [core@ip-10-0-49-2 ~]$ ls /etc/mco/
      internal-registry-pull-secret.json  machineconfig-revert.json
      
      
          

      Expected results:

      When we remove the MOSC resource, OCL should be disabled and the nodes should stop using the layered image without problems.
      
          

      Additional info:

      We don't see this issue happening in clusters that do not use a proxy configuration.
      
      
      The issue is causing this job to fail https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/job-history/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.18-arm64-nightly-aws-ipi-longrun-mco-proxy-fips-p1-f28
          

              zzlotnik@redhat.com Zack Zlotnik
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: