Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61595

Issues with loading OOT drivers for QDU x100 DU PCIe cards from v4.14+

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.14, 4.15, 4.16
    • kmm
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Kernel Module Operator was being used to load drivers for QDU x100 DU PCIe card on OCP v4.12 without issues. Post upgrading to v4.14+ the mhi module is now being included in kernel 5.14 --> https://elixir.bootlin.com/linux/v5.14.21/source/drivers/bus/mhi/core
      obj-$(CONFIG_MHI_BUS) += mhi.o)
      
      So, now when loading this module, it causes issues

      Version-Release number of selected component (if applicable):

      OCP v4.14+

      How reproducible:

      Any OCP version utilising kernel v5.14+ might hit this issue

      Steps to Reproduce:

      Using the below resource on cluster v4.14+ causing issues
      
      Kernel Module Source
      kind: Module
      metadata:
        name: csmx100
        namespace: x100-operator-resources
      spec:
        moduleLoader:
          container:
            modprobe:
              moduleName: csm_dp
              modulesLoadingOrder:
                - csm_dp
                - mhi mhi_uci mhi_net wwan_mhi mhi_pci mhi_ptp
            inTreeModulesToRemove: [mhi, mhi_net, mhi_wwan_ctrl]
            kernelMappings:
              - regexp: '^.+$'
                containerImage: #"<repo_url / image_prefix>-${KERNEL_FULL_VERSION}"
                build:
                  buildArgs:
                  - name: LONG_AU_TAG
                    value: #"ReplaceAUTag"
                  dockerfileConfigMap:
                    name: "x100-kosyncbuild-dockerfile-identifier"
            securityContext:
              privileged: true
          hostNetwork: true
          serviceAccount: x100-kmodule-sa
          serviceAccountName: x100-kmodule-sa
        imageRepoSecret:
          name: qcrepo-pull-secret
        selector:
          qualcomm.com/x100.present: "true"    

      Actual results:

      Seeing these logs in dmesg
      
      [200049.891210] wwan_mhi: disagrees about version of symbol wwan_port_rx
      [200049.891214] wwan_mhi: Unknown symbol wwan_port_rx (err -22)
      [200049.891217] wwan_mhi: disagrees about version of symbol wwan_create_port
      [200049.891218] wwan_mhi: Unknown symbol wwan_create_port (err -22)
      [200087.320545] mhi_wwan_ctrl: disagrees about version of symbol mhi_queue_is_full
      [200087.320548] mhi_wwan_ctrl: Unknown symbol mhi_queue_is_full (err -22)
      [200087.320557] mhi_wwan_ctrl: disagrees about version of symbol mhi_queue_skb
      [200087.320557] mhi_wwan_ctrl: Unknown symbol mhi_queue_skb (err -22)
      [200087.320562] mhi_wwan_ctrl: disagrees about version of symbol mhi_driver_unregister
      [200087.320563] mhi_wwan_ctrl: Unknown symbol mhi_driver_unregister (err -22)
      [200087.320566] mhi_wwan_ctrl: disagrees about version of symbol mhi_unprepare_from_transfer
      [200087.320567] mhi_wwan_ctrl: Unknown symbol mhi_unprepare_from_transfer (err -22)
      [200087.320569] mhi_wwan_ctrl: disagrees about version of symbol mhi_prepare_for_transfer
      [200087.320570] mhi_wwan_ctrl: Unknown symbol mhi_prepare_for_transfer (err -22)
      [200087.320571] mhi_wwan_ctrl: disagrees about version of symbol mhi_get_free_desc_count
      [200087.320572] mhi_wwan_ctrl: Unknown symbol mhi_get_free_desc_count (err -22)
      [200087.320575] mhi_wwan_ctrl: disagrees about version of symbol __mhi_driver_register
      [200087.320575] mhi_wwan_ctrl: Unknown symbol __mhi_driver_register (err -22)

      Expected results:

      Kernel module gets loaded & is usable

      Additional info:

      They still will need to load their OOT module as the upstream version doesn't have required drivers for QDU x100 DU PCIe card. I suggested them to follow documentation --> 4.7. Replacing in-tree modules with out-of-tree modules(https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html/specialized_hardware_and_driver_enablement/kernel-module-management-operator#kmm-replacing-in-tree-modules-with-out-of-tree-modules_kernel-module-management-operator)
      
      The issue still happens

              ybettan@redhat.com Yoni Bettan
              rhn-support-adubey Akash Dubey
              None
              None
              Constantin Daniel Vultur Constantin Daniel Vultur
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: