Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-70260

Traffic exchange between DPDK workloads is causing kernel taint #9 (kernel issued warning) in OCP worker nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.14.z
    • RHCOS
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      (I've created this issue under the RHCOS component, since it looks like kernel-related. But it comes from a networking-related DPDK workload. Please feel free to redirect the issue to a more suitable component if required).
      Description of problem:

      Traffic exchange between DPDK workloads is causing kernel taint #9 (kernel issued warning) in OCP worker nodes
          

      Version-Release number of selected component (if applicable):

      OpenShift 4.14.z
      - The problem is not appearing in OpenShift 4.14.60, the latest GA version up to now. It's working there
      - It was observed for the first time with OpenShift 4.14 nightly 2025-12-07 05:09
      - This report is based in OpenShift 4.14 nightly 2025-12-25 14:26 (latest nightly available up to now)
      - This has not observed in other z-stream versions of higher OpenShift releases
          

      How reproducible:

      100%
          

      Steps to Reproduce:

          1. Deploy an OpenShift baremetal cluster with IPI installation mode (3 master and 3 worker nodes). We used Distributed-CI (DCI) for our deployment.
          2. Create a Performance Profile, such as the following one:
      
      ---
      kind: PerformanceProfile
      apiVersion: "performance.openshift.io/v2"
      metadata:
        name: blueprint-profile
      spec:
        cpu:
          isolated: "1-19,21-39,41-59,61-79"
          reserved: "0,40,20,60"
        additionalKernelArgs:
          - nohz_full=1-19,21-39,41-59,61-79
        hugepages:
          pages:
            - size: "1G"
              count: 32
              node: 0
            - size: "1G"
              count: 32
              node: 1
            - size: "2M"
              count: 12000
              node: 0
            - size: "2M"
              count: 12000
              node: 1
        realTimeKernel:
          enabled: false
        workloadHints:
          realTime: false
          highPowerConsumption: false
          perPodPowerManagement: true
        net:
          userLevelNetworking: false
        numa:
          topologyPolicy: "single-numa-node"
        nodeSelector:
          node-role.kubernetes.io/worker: ""
      ...
      
          3. Create SRIOV policies and networks for the DPDK workload, such as the following ones:
      
      ---
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodePolicy
      metadata:
        name: example-cnf-policy1
        namespace: openshift-sriov-network-operator
      spec:
        deviceType: vfio-pci
        isRdma: false
        mtu: 9000
        nicSelector:
          deviceID: "158b"
          pfNames:
          - ens2f0#0-7
          vendor: 8086
        nodeSelector:
          node-role.kubernetes.io/worker: ""
        numVfs: 8
        priority: 99
        resourceName: example_cnf_res1
      
      ---
      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetwork
      metadata:
        name: example-cnf-net1
        namespace: openshift-sriov-network-operator
      spec:
        capabilities: '{"mac": true, "ips": true}'
        ipam: '{"type": "static"}'
        networkNamespace: example-cnf
        resourceName: example_cnf_res1
        spoofChk: "off"
        trust: "on"
        vlan: 827
      
          4. Deploy the DPDK application, using SRIOV for the network interfaces' configuration, and start traffic forwarding. This report is based in example-cnf workload (https://github.com/openshift-kni/example-cnf), deployed with the following Ansible automation: https://github.com/redhatci/ansible-collection-redhatci-ocp/tree/main/roles/example_cnf_deploy, but it has also been observed with other DPDK workloads, such as F5 Service Proxy for Kubernetes (SPK).
          

      Actual results:

      When the traffic generator starts sending traffic to the DPDK application, in the worker node where the traffic generator is running, the following error messages appeared in dmesg (this is just a summary, there are more errors):
      
      [ 3708.455790] ------------[ cut here ]------------
      [ 3708.455806] Type was not set for devlink port.
      [ 3708.455829] WARNING: CPU: 0 PID: 44628 at net/core/devlink.c:9832 devlink_port_type_warn+0x11/0x20
      [ 3708.455840] Modules linked in: iavf vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio xt_addrtype xt_connmark xt_MASQUERADE nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs cfg80211 veth xt_mark ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink xt_nat nft_chain_nat xt_CT vhost_net vhost vhost_iotlb tap tun xt_REDIRECT xt_conntrack xt_comment nft_compat nft_counter nf_tables 8021q garp mrp stp llc bonding rfkill geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate ipmi_ssif intel_uncore mgag200 i2c_algo_bit drm_shmem_helper pcspkr drm_kms_helper ses enclosure syscopyarea mei_me sysfillrect
      [ 3708.455901]  sysimgblt fb_sys_fops intel_pch_thermal hpilo mei lpc_ich acpi_ipmi ioatdma ipmi_si dca ipmi_devintf ipmi_msghandler acpi_tad acpi_power_meter ip_tables drm xfs libcrc32c dm_multipath ib_umad ib_ipoib mlx5_ib rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm irdma sd_mod ice sg gnss bnxt_re ib_uverbs ib_core nvme_tcp mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme_fabrics nvme mlxfw bnxt_en smartpqi psample i40e scsi_transport_sas nvme_core tls pci_hyperv_intf hpwdt nvme_common tg3 t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod fuse
      [ 3708.455943] CPU: 0 PID: 44628 Comm: kworker/0:2 Not tainted 5.14.0-284.151.1.el9_2.x86_64 #1
      [ 3708.455945] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/31/2024
      [ 3708.455948] Workqueue: events devlink_port_type_warn
      [ 3708.455951] RIP: 0010:devlink_port_type_warn+0x11/0x20
      [ 3708.455954] Code: ff 48 c7 03 40 35 98 9d 45 31 c0 e9 b5 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 c7 08 76 cc 9d e8 4b 27 1a 00 <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 8b
      [ 3708.455957] RSP: 0018:ffffb9c0306efe90 EFLAGS: 00010286
      [ 3708.455959] RAX: 0000000000000000 RBX: ffff977b71849bc0 RCX: 0000000000000027
      [ 3708.455961] RDX: ffff97aa3f8198e8 RSI: 0000000000000001 RDI: ffff97aa3f8198e0
      [ 3708.455962] RBP: ffff976c16bbc268 R08: 0000000000000000 R09: 00000000ffff7fff
      [ 3708.455964] R10: ffffb9c0306efd30 R11: ffffffff9e5e9268 R12: ffff97aa3f82a500
      [ 3708.455965] R13: ffff97aa3f830900 R14: 0000000000000000 R15: ffff97aa3f830905
      [ 3708.455967] FS:  0000000000000000(0000) GS:ffff97aa3f800000(0000) knlGS:0000000000000000
      [ 3708.455969] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3708.455970] CR2: 000000c0054a5f90 CR3: 000000087bfe2001 CR4: 00000000007706f0
      [ 3708.455971] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3708.455973] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3708.455974] PKRU: 55555554
      [ 3708.455976] Call Trace:
      [ 3708.455978]  <TASK>
      [ 3708.455979]  process_one_work+0x1e2/0x3b0
      [ 3708.455987]  ? rescuer_thread+0x390/0x390
      [ 3708.455990]  worker_thread+0x50/0x3a0
      [ 3708.455993]  ? rescuer_thread+0x390/0x390
      [ 3708.455996]  kthread+0xd6/0x100
      [ 3708.456001]  ? kthread_complete_and_exit+0x20/0x20
      [ 3708.456003]  ret_from_fork+0x1f/0x30
      [ 3708.456010]  </TASK>
      [ 3708.456011] ---[ end trace a57188da0b5d3ab0 ]---
      [ 3708.456012] ------------[ cut here ]------------
      [ 3708.456013] Type was not set for devlink port.
      [ 3708.456019] WARNING: CPU: 0 PID: 44628 at net/core/devlink.c:9832 devlink_port_type_warn+0x11/0x20
      [ 3708.456022] Modules linked in: iavf vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio xt_addrtype xt_connmark xt_MASQUERADE nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs cfg80211 veth xt_mark ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink xt_nat nft_chain_nat xt_CT vhost_net vhost vhost_iotlb tap tun xt_REDIRECT xt_conntrack xt_comment nft_compat nft_counter nf_tables 8021q garp mrp stp llc bonding rfkill geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate ipmi_ssif intel_uncore mgag200 i2c_algo_bit drm_shmem_helper pcspkr drm_kms_helper ses enclosure syscopyarea mei_me sysfillrect
      [ 3708.456053]  sysimgblt fb_sys_fops intel_pch_thermal hpilo mei lpc_ich acpi_ipmi ioatdma ipmi_si dca ipmi_devintf ipmi_msghandler acpi_tad acpi_power_meter ip_tables drm xfs libcrc32c dm_multipath ib_umad ib_ipoib mlx5_ib rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm irdma sd_mod ice sg gnss bnxt_re ib_uverbs ib_core nvme_tcp mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme_fabrics nvme mlxfw bnxt_en smartpqi psample i40e scsi_transport_sas nvme_core tls pci_hyperv_intf hpwdt nvme_common tg3 t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod fuse
      [ 3708.456080] CPU: 0 PID: 44628 Comm: kworker/0:2 Tainted: G        W        --------  ---  5.14.0-284.151.1.el9_2.x86_64 #1
      [ 3708.456082] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/31/2024
      [ 3708.456082] Workqueue: events devlink_port_type_warn
      [ 3708.456085] RIP: 0010:devlink_port_type_warn+0x11/0x20
      [ 3708.456086] Code: ff 48 c7 03 40 35 98 9d 45 31 c0 e9 b5 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 c7 08 76 cc 9d e8 4b 27 1a 00 <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 8b
      [ 3708.456088] RSP: 0018:ffffb9c0306efe90 EFLAGS: 00010286
      [ 3708.456089] RAX: 0000000000000000 RBX: ffff977b71849bc0 RCX: 0000000000000027
      [ 3708.456090] RDX: ffff97aa3f8198e8 RSI: 0000000000000001 RDI: ffff97aa3f8198e0
      [ 3708.456091] RBP: ffff976c0ca32268 R08: 0000000000000000 R09: 00000000ffff7fff
      [ 3708.456091] R10: ffffb9c0306efd30 R11: ffffffff9e5e9268 R12: ffff97aa3f82a500
      [ 3708.456092] R13: ffff97aa3f830900 R14: 0000000000000000 R15: ffff97aa3f830905
      [ 3708.456093] FS:  0000000000000000(0000) GS:ffff97aa3f800000(0000) knlGS:0000000000000000
      [ 3708.456094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3708.456095] CR2: 000000c0054a5f90 CR3: 000000087bfe2001 CR4: 00000000007706f0
      [ 3708.456095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3708.456096] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3708.456097] PKRU: 55555554
      [ 3708.456097] Call Trace:
      [ 3708.456098]  <TASK>
      [ 3708.456098]  process_one_work+0x1e2/0x3b0
      [ 3708.456101]  ? rescuer_thread+0x390/0x390
      [ 3708.456103]  worker_thread+0x50/0x3a0
      [ 3708.456106]  ? rescuer_thread+0x390/0x390
      [ 3708.456125]  kthread+0xd6/0x100
      [ 3708.456127]  ? kthread_complete_and_exit+0x20/0x20
      [ 3708.456129]  ret_from_fork+0x1f/0x30
      [ 3708.456132]  </TASK>
      [ 3708.456132] ---[ end trace a57188da0b5d3ab1 ]---
      [ 3708.456998] ------------[ cut here ]------------
      [ 3708.457001] Type was not set for devlink port.
      [ 3708.457026] WARNING: CPU: 20 PID: 155488 at net/core/devlink.c:9832 devlink_port_type_warn+0x11/0x20
      [ 3708.457036] Modules linked in: iavf vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio xt_addrtype xt_connmark xt_MASQUERADE nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs cfg80211 veth xt_mark ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink xt_nat nft_chain_nat xt_CT vhost_net vhost vhost_iotlb tap tun xt_REDIRECT xt_conntrack xt_comment nft_compat nft_counter nf_tables 8021q garp mrp stp llc bonding rfkill geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate ipmi_ssif intel_uncore mgag200 i2c_algo_bit drm_shmem_helper pcspkr drm_kms_helper ses enclosure syscopyarea mei_me sysfillrect
      [ 3708.457082]  sysimgblt fb_sys_fops intel_pch_thermal hpilo mei lpc_ich acpi_ipmi ioatdma ipmi_si dca ipmi_devintf ipmi_msghandler acpi_tad acpi_power_meter ip_tables drm xfs libcrc32c dm_multipath ib_umad ib_ipoib mlx5_ib rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm irdma sd_mod ice sg gnss bnxt_re ib_uverbs ib_core nvme_tcp mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme_fabrics nvme mlxfw bnxt_en smartpqi psample i40e scsi_transport_sas nvme_core tls pci_hyperv_intf hpwdt nvme_common tg3 t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod fuse
      [ 3708.457121] CPU: 20 PID: 155488 Comm: kworker/20:0 Tainted: G        W        --------  ---  5.14.0-284.151.1.el9_2.x86_64 #1
      [ 3708.457123] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/31/2024
      [ 3708.457124] Workqueue: events devlink_port_type_warn
      [ 3708.457126] RIP: 0010:devlink_port_type_warn+0x11/0x20
      [ 3708.457129] Code: ff 48 c7 03 40 35 98 9d 45 31 c0 e9 b5 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 c7 08 76 cc 9d e8 4b 27 1a 00 <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 8b
      [ 3708.457131] RSP: 0018:ffffb9c02bb17e90 EFLAGS: 00010286
      [ 3708.457133] RAX: 0000000000000000 RBX: ffff97b1a0f59440 RCX: 0000000000000027
      [ 3708.457134] RDX: ffff97ea3f2198e8 RSI: 0000000000000001 RDI: ffff97ea3f2198e0
      [ 3708.457135] RBP: ffff97ab4d620268 R08: 0000000000000000 R09: 00000000ffff7fff
      [ 3708.457136] R10: ffffb9c02bb17d30 R11: ffffffff9e5e9268 R12: ffff97ea3f22a500
      [ 3708.457136] R13: ffff97ea3f230900 R14: 0000000000000000 R15: ffff97ea3f230905
      [ 3708.457137] FS:  0000000000000000(0000) GS:ffff97ea3f200000(0000) knlGS:0000000000000000
      [ 3708.457138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3708.457139] CR2: 000000c004bda000 CR3: 000000087bfe2002 CR4: 00000000007706e0
      [ 3708.457140] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3708.457141] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3708.457142] PKRU: 55555554
      [ 3708.457142] Call Trace:
      [ 3708.457144]  <TASK>
      [ 3708.457145]  process_one_work+0x1e2/0x3b0
      [ 3708.457152]  ? rescuer_thread+0x390/0x390
      [ 3708.457155]  worker_thread+0x50/0x3a0
      [ 3708.457157]  ? rescuer_thread+0x390/0x390
      [ 3708.457159]  kthread+0xd6/0x100
      [ 3708.457163]  ? kthread_complete_and_exit+0x20/0x20
      [ 3708.457165]  ret_from_fork+0x1f/0x30
      [ 3708.457171]  </TASK>
      [ 3708.457172] ---[ end trace a57188da0b5d3ab2 ]---
      [ 3708.457173] ------------[ cut here ]------------
      [ 3708.457173] Type was not set for devlink port.
      [ 3708.457177] WARNING: CPU: 20 PID: 155488 at net/core/devlink.c:9832 devlink_port_type_warn+0x11/0x20
      [ 3708.457179] Modules linked in: iavf vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio xt_addrtype xt_connmark xt_MASQUERADE nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs cfg80211 veth xt_mark ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink xt_nat nft_chain_nat xt_CT vhost_net vhost vhost_iotlb tap tun xt_REDIRECT xt_conntrack xt_comment nft_compat nft_counter nf_tables 8021q garp mrp stp llc bonding rfkill geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay ext4 mbcache jbd2 intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl intel_cstate ipmi_ssif intel_uncore mgag200 i2c_algo_bit drm_shmem_helper pcspkr drm_kms_helper ses enclosure syscopyarea mei_me sysfillrect
      [ 3708.457207]  sysimgblt fb_sys_fops intel_pch_thermal hpilo mei lpc_ich acpi_ipmi ioatdma ipmi_si dca ipmi_devintf ipmi_msghandler acpi_tad acpi_power_meter ip_tables drm xfs libcrc32c dm_multipath ib_umad ib_ipoib mlx5_ib rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm irdma sd_mod ice sg gnss bnxt_re ib_uverbs ib_core nvme_tcp mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme_fabrics nvme mlxfw bnxt_en smartpqi psample i40e scsi_transport_sas nvme_core tls pci_hyperv_intf hpwdt nvme_common tg3 t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod fuse
      [ 3708.457229] CPU: 20 PID: 155488 Comm: kworker/20:0 Tainted: G        W        --------  ---  5.14.0-284.151.1.el9_2.x86_64 #1
      [ 3708.457230] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/31/2024
      [ 3708.457230] Workqueue: events devlink_port_type_warn
      [ 3708.457232] RIP: 0010:devlink_port_type_warn+0x11/0x20
      [ 3708.457234] Code: ff 48 c7 03 40 35 98 9d 45 31 c0 e9 b5 fe ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 c7 08 76 cc 9d e8 4b 27 1a 00 <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 8b
      [ 3708.457235] RSP: 0018:ffffb9c02bb17e90 EFLAGS: 00010286
      [ 3708.457236] RAX: 0000000000000000 RBX: ffff97b1a0f59440 RCX: 0000000000000027
      [ 3708.457237] RDX: ffff97ea3f2198e8 RSI: 0000000000000001 RDI: ffff97ea3f2198e0
      [ 3708.457238] RBP: ffff97ab52164268 R08: 0000000000000000 R09: 00000000ffff7fff
      [ 3708.457239] R10: ffffb9c02bb17d30 R11: ffffffff9e5e9268 R12: ffff97ea3f22a500
      [ 3708.457240] R13: ffff97ea3f230900 R14: 0000000000000000 R15: ffff97ea3f230905
      [ 3708.457241] FS:  0000000000000000(0000) GS:ffff97ea3f200000(0000) knlGS:0000000000000000
      [ 3708.457242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3708.457243] CR2: 000000c004bda000 CR3: 000000087bfe2002 CR4: 00000000007706e0
      [ 3708.457244] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3708.457244] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3708.457245] PKRU: 55555554
      [ 3708.457246] Call Trace:
      [ 3708.457246]  <TASK>
      [ 3708.457246]  process_one_work+0x1e2/0x3b0
      [ 3708.457249]  ? rescuer_thread+0x390/0x390
      [ 3708.457250]  worker_thread+0x50/0x3a0
      [ 3708.457252]  ? rescuer_thread+0x390/0x390
      [ 3708.457254]  kthread+0xd6/0x100
      [ 3708.457256]  ? kthread_complete_and_exit+0x20/0x20
      [ 3708.457258]  ret_from_fork+0x1f/0x30
      [ 3708.457260]  </TASK>
      [ 3708.457260] ---[ end trace a57188da0b5d3ab3 ]---
      
      Suddenly, kernel taint bit #9 was activated:
      $ cat /proc/sys/kernel/tainted
      512
      
      The same happened in the other worker nodes.
          

      Expected results:

      There should not be any kernel taint in the OpenShift nodes, which is the observed behavior in OCP 4.14.60 and in higher OCP releases, z-stream versions.
          

      Additional info:

      - Servers: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10
      - NICs: Intel XXV710 family
      - From this DCI job, a must-gather file from the cluster can be extracted: https://www.distributed-ci.io/jobs/9db2a166-1c0b-47c8-9e26-b9bebfa7d7f0/files > In Files section, just download must_gather.tar.gz
      - Also, please find attached the dmesg output from the three worker nodes, to see the error messages related to the kernel taint (they appear at the end of the files): https://drive.google.com/drive/folders/1VtXw6KVivg9TqH-LsOaZ3zgqSullZwc5?usp=sharing.
          

              Unassigned Unassigned
              raperez@redhat.com Ramon Perez
              None
              None
              Michael Nguyen Michael Nguyen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: