Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16622

4.13/4.14 MCDs do not work with FIPS enabled golang builders

XMLWordPrintable

    • Critical
    • No
    • MCO Sprint 239
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Updated Description:

      The MCD, during a node lifespan, can go through multiple iterations of RHEL8 and RHEL9. This was not a problem until we turned on fips enabled golang with dynamic linking. This requires the MCD binary running (either in container or on host) to always match the host built version. As an additional complication, we have an early boot process (machine-config-daemon-pull/firstboot.service) that can be different from the rest of the cluster node versions (bootimage version is not updated) as well as the fact that we chroot (dynamically go from rhel8 to rhel9) in the container, so we need a better process to ensure the right binary is always used.

       

      Current testing of this flow in https://github.com/openshift/machine-config-operator/pull/3799 

       

      Description of problem:

      MCO CI started failing this week, and 4.14 nightlies have also made it into 4.14 nightlies. See also: https://issues.redhat.com/browse/TRT-1143. The failure manifests as a warning in the MCO. Looking at a MCD log, you will see a failure like:
      
      W0712 08:52:15.475268    7971 daemon.go:1089] Got an error from auxiliary tools: kubelet health check has failed 3 times: Get "http://localhost:10248/healthz": dial tcp: lookup localhost: device or resource busy
      
      The root cause so far seems to be that 4.14 switched from a regular 1.20.3 golang to 1.20.5 with FIPS and dynamic linking in the builder, causing the failures to begin. Most functionality is not broken, but the daemon subroutine that does the kubelet health check appears to be unable to reach the localhost endpoint
      
      One possibility is that the rhel8-daemon chroot'ing into the rhel9-host and running these commands is causing the issue. Regardless, there are a bunch of issues with rhel8/rhel9 duality in the MCD that we would need to address in 4.13/4.14
      
      Also tangentially related: https://issues.redhat.com/browse/MCO-663

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      Always

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

            rhn-engineering-skumari Sinny Kumari
            jerzhang@redhat.com Yu Qi Zhang
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: