Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-59027

Sierra modem MC7304 invalidates and removes the bus during the download/transfer after 10 consecutive timeouts.

    • No
    • Important
    • 1
    • rhel-sst-network-management
    • ssg_networking
    • None
    • Hide
      Customer/Partner Jira ID Customer Case Status Details
      Die Schweizerische Post AG
       
      RHEL-59027 03911368 This issue involves Sierra MC7304 modem which is invalidated and removed from the bus after 10 consecutive QMI timeouts during large file downloads, disrupting connectivity. The problem is not observed during uploads and is specific to ModemManager v1.20.2. A patch disabling QMI timeouts resolves the issue but is not considered a viable long-term solution. The next step is to investigate the root cause of the timeouts and explore adding a user-configurable option in mmcli to adjust timeout retry settings based on different modem use cases. This work will be started as soon as an engineer finishes the current work in progress in the sprint. 
       
      [2024-10-14] The work is currently being handled in the current sprint. Discussion has also started upstream to help in finding the root cause. 
       
      [2024-10-21] The work is still being handled in the current sprint and the discussion continues upstream to clarify the current behavior and steer to the right direction on finding the root cause. 
      Show
      Customer/Partner Jira ID Customer Case Status Details Die Schweizerische Post AG   RHEL-59027 03911368 This issue involves Sierra MC7304 modem which is invalidated and removed from the bus after 10 consecutive QMI timeouts during large file downloads, disrupting connectivity. The problem is not observed during uploads and is specific to ModemManager v1.20.2. A patch disabling QMI timeouts resolves the issue but is not considered a viable long-term solution. The next step is to investigate the root cause of the timeouts and explore adding a user-configurable option in mmcli to adjust timeout retry settings based on different modem use cases. This work will be started as soon as an engineer finishes the current work in progress in the sprint.    [2024-10-14] The work is currently being handled in the current sprint. Discussion has also started upstream to help in finding the root cause.    [2024-10-21] The work is still being handled in the current sprint and the discussion continues upstream to clarify the current behavior and steer to the right direction on finding the root cause. 
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • NMT SST - Future releases
    • Hide

      Definition of Done:

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given a system administrator managing cellular modems through ModemManager,

      When using mmcli or a config file on a device with known QMI timeout sensitivity,

      Then they should be able to adjust or disable the timeout retry threshold per-modem to prevent the modem from being marked invalid during high-throughput transfers.


      ( ) Code changes are included in a downstream build attached to an errata.


      ( ) All required testing (manual and/or automated) passes successfully.


      ( ) Related documentation updates (if applicable) have been completed.

      Show
      Definition of Done: Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given a system administrator managing cellular modems through ModemManager, When using mmcli or a config file on a device with known QMI timeout sensitivity, Then they should be able to adjust or disable the timeout retry threshold per-modem to prevent the modem from being marked invalid during high-throughput transfers. ( ) Code changes are included in a downstream build attached to an errata. ( ) All required testing (manual and/or automated) passes successfully. ( ) Related documentation updates (if applicable) have been completed.
    • None
    • None
    • x86_64
    • None

      Issue :

      ModemManager invalidates the Sierra MC7304 modem. When large file is downloaded from the board computer, the modem on board computer is invalidated after 10 QMI timeouts. Issue is not seen during upload. 

       

      ModemManager[1399]: <error> [modem0] port cdc-wdm3 timed out 10 consecutive times, marking modem as invalid 

      After the modem is marked invalid, the bus is removed.

      $ mmcli -m 0
      error: couldn't find modem 

       

      ModemManager[1398]: <error> [modem0] port cdc-wdm3 timed out 10 consecutive times, marking modem as invalid
      ModemManager[1398]: <debug> [/dev/cdc-wdm3] number of consecutive timeouts: 10
      ModemManager[1398]: <debug> [/dev/cdc-wdm3] transaction 0x29 aborted, but message is not abortable
      ModemManager[1398]: <warn>  [modem0/bearer3] reloading stats failed: QMI operation failed: Transaction timed out
      ModemManager[1398]: <debug> [modem0/bearer1] removing from bus
      ModemManager[1398]: <debug> [modem0/bearer3] removing from bus
      ModemManager[1398]: <debug> [device /sys/devices/pci0000:00/0000:00:14.0/usb2/2-4] unexported modem from path '/org/freedesktop/ModemManager1/Modem/0'
      ModemManager[1398]: <debug> [modem0/wwp0s20u4i10/net] port now disconnected 

      This behavior causes the transfer to disrupt.

       

       

      Modem information :

       

        Hardware |           manufacturer: Sierra Wireless, Incorporated
                 |                  model: MC7304
                 |      firmware revision: SWI9X15C_06.03.32.02 r26426 CNSHZ-AR-BUILD 2015/01/16 01:32:41
                 |           h/w revision: 1.0
                 |              supported: gsm-umts, lte
                 |                current: gsm-umts, lte
                 |           equipment id: 356853056551122 

       

       

      Observation & Analysis :

      • Issue is seen with latest ModemManager-1.20.2-1.el9.
      • Issue is not seen with older versions ModemManager-1.18.2-3.el9.
      • Above versions are tested with RHEL8 as well and same pattern is seen.
      • The QMI timeout warnings are seen for ModemManager <= 1.18.2, but the transfer completes without issue.
      • Other vendor modem Telit LEPCIC4EU13T130H00 doesn't show any issue in ModemManager-1.20.2-1.el9.

       

      To understand how modem behaves when timeout is disabled, i created a patch (attached - 0001-modem-disable-QMI-timeouts.patch, also below) and brewed the rpm with it. The results are good and the bus is not removed anymore, also the transfer completes fine as per customer.

       

      $ cat 0001-modem-disable-QMI-timeouts.patch 
      From 30c7b85b0677f17bcd38325f16c8f5813694e540 Mon Sep 17 00:00:00 2001
      From: Abhishek Rawal <arawal@redhat.com>
      Date: Wed, 11 Sep 2024 23:00:33 +0530
      Subject: [PATCH] modem: disable QMI timeouts
      
      ---
       src/mm-broadband-modem-qmi.c | 1 +
       1 file changed, 1 insertion(+)
      
      diff --git a/src/mm-broadband-modem-qmi.c b/src/mm-broadband-modem-qmi.c
      index 98868fa..66b92e9 100644
      --- a/src/mm-broadband-modem-qmi.c
      +++ b/src/mm-broadband-modem-qmi.c
      @@ -13516,6 +13516,7 @@ mm_broadband_modem_qmi_new (const gchar *device,
                                MM_BASE_MODEM_PLUGIN, plugin,
                                MM_BASE_MODEM_VENDOR_ID, vendor_id,
                                MM_BASE_MODEM_PRODUCT_ID, product_id,
      +			 MM_BASE_MODEM_MAX_TIMEOUTS, 0,
                                /* QMI bearer supports NET only */
                                MM_BASE_MODEM_DATA_NET_SUPPORTED, TRUE,
                                MM_BASE_MODEM_DATA_TTY_SUPPORTED, FALSE,
      -- 
      2.46.0 

      Test rpms can be found at : https://download.eng.bos.redhat.com/brewroot/work/tasks/4707/64144707/

       

       

      However, we don't think this may be the right solution as we are disabling the timeout for all qmi modems. (also the other modem was working fine). 

       

      Impact :

      There are 1800 boards computers which uses Sierra modem in question. Customer have modem losses about 2 devices per day. Systems are unable to reliably send telemetry to their backend, which might delay ticket sales, passenger WiFi, and collection of passenger counting data.

      Steps to reproduce

      Issue is reproduced easily by performing the download by customer.

       

       

      Attachments :

      sosreport contains ModemManager debug logs of issue on ModemManager-1.20.2-1.el9. The ModemManager-test-success.log is for the testing on the patched rpm (disable timeout). sosreport-MM-1-18.tar.gz contains the logs for v1.18 of ModemManager.

       

       

       

      The patch we supplied works as we have qmi modem's timeouts disabled, however i don't think it's correct solution to it. Requesting your insights and help to provide the right solution/answer to customer.

      1. Why the modem is timing out at first place ?
      2. Is there any possibility to expose the tunable at user level in mmcli to tune the timeout retries based on need of different modems ?

       

              rh-ee-sfaye Stanislas Faye
              rhn-support-arawal Abhishek Rawal
              Lubomir Rintel
              Network Management Team Network Management Team
              Vladimir Benes Vladimir Benes
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: