Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48776

Driver Toolkit imagestream tags unavailable for worker node versions

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.13, 4.12, 4.14, 4.15, 4.16, 4.17, 4.18, 4.19
    • HyperShift
    • None

      Description of problem:

      Driver Toolkit images are used for building and enabling kernel dependent features on RHCOS nodes. Many OpenShift operators leverage the Driver Toolkit images to enable drivers at the hardware level. These operators include (not limited to) NVIDIA GPU operator, Intel Gaudi operator, kernel module management operator, etc.
      
      The typical flow of these operators is to introspect the ostree version of the RHCOS worker where it's running and use the Driver Toolkit image tagged with the ostree version.
      
      e.g.
      
      sh-5.1# cat /etc/os-release | grep OSTREE_VERSION
      OSTREE_VERSION="417.94.202412250812-0"
      
      ❯ oc get imagetags -n openshift | grep driver-toolkit
      driver-toolkit:417.94.202411070820-0   Scheduled   image/sha256:d32e77d1ac790e382faced2a0b1276abaede7ad520233ac77ff4a3ca279e0b29   1         2 weeks ago
      driver-toolkit:417.94.202412180008-0   Scheduled   image/sha256:08b87914b52745e957a46caa9e3085902557f609c18adf7d60ad7a8b6db19af2   1         5 days ago
      driver-toolkit:latest                  Scheduled   image/sha256:08b87914b52745e957a46caa9e3085902557f609c18adf7d60ad7a8b6db19af2   2         5 days ago
      
      On HyperShift HostedClusters, control plane version and worker node versions do not often match due to the nature of the decoupled architecture. These Driver Toolkit image tags are only present for versions that match the control plane version, not the RHCOS worker version.
      
      Given that these Driver Toolkit image allow operators to enable kernel/hardware features on the RHCOS worker nodes, HyperShift clusters should make the Driver Toolkit images that match the NodePool version available on the HostedClusters by default.

      Version-Release number of selected component (if applicable):

          All

      How reproducible:

          100%

      Steps to Reproduce:

          1. Create HostedCluster on OCP version
          2. Create NodePool on different OCP version
          3. Install NVIDIA GPU operator, Intel Gaudi operator, KMM operator, or similar
          4. Observe the installation fail, observe available imagetags in cluster    

      Actual results:

          Driver Toolkit images that match the RHCOS worker version does not exist

      Expected results:

          Driver Toolkit images that match the RHCOS worker version to exist

      Additional info:

          

              asegurap1@redhat.com Antoni Segura Puimedon
              hsueki Hidematsu Sueki
              Jie Zhao Jie Zhao
              IBM Employee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: