Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-3934

Forklift Controller Fails to Pick Up Rotated Token Immediately, Causing ~30m Outage

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 2.10.0
    • None
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Important

      Description of problem:

      The forklift-controller (specifically the inventory container) fails to detect or apply the rotated ManagedServiceAccount token immediately after it is rotated.
      
      This results in the controller continuing to use the old token until it hard-expires.
      
      This leads to authentication failures and provider inactivity. The controller eventually reconciles and picks up the new token after a significant delay (approx. 30 minutes in our test), without a pod restart. 
      
      Occurs/Issue observed on every rotation interval (1hr in this case)

      Version-Release number of selected component (if applicable):

      MTV. 2.10.0

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Deploy OCP 4.20.3/ACM 2.15 RC (az)
      2. Deploy/import a spoke (az)
      3. Install CNV and MTV operators on te hub (via patching MCH:
      
      oc patch multiclusterhub ... --type=merge -p '{"spec": {"overrides": {"components": [{"name": "cnv-mtv-integrations-preview", "enabled": true}]}}}')
      OR otherwise
      
      4. Install CNV on spoke via label: oc label managedcluster <spoke-name> acm/cnv-operator-install=true
      
      5. Wait for the token rotation, if validity is 1hr, rotation takes place on around 48m (80% on token's lifetime - ManagedServiceAccount keeps some extra room/time to rotate from my observation)

      Actual results:

      Forklift Controller Logs (inventory container) show repeated failures starting exactly at token expiration time, persisting for ~36 minutes
      
      From the container-
      {"level":"info","ts":"2025-11-29 09:51:10.123","logger":"provider","msg":"Connection test failed.","reason":"Unauthorized","provider":"mtv-integrations/az-virt-215-mtv"}
      {"level":"info","ts":"2025-11-29 09:51:20.456","logger":"provider","msg":"Connection test failed.","reason":"Unauthorized","provider":"mtv-integrations/az-virt-215-mtv"}

      Expected results:

      The forklift-controller should detect the rotated token (available from minute 48) and seamlessly switch to it before the old token expires at minute 60, resulting in zero downtime.

      Additional info:

       

      Simple workarounds such as increasing expiry time or restarting the pods fixes the issue.

              gcheresh@redhat.com Genadi Chereshnya
              rhn-support-ashafi Atif Shafi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: