Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35338

Azure CPMS periodics are failing due to non-retryable API errors

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.14.z
    • 4.17.0
    • None
    • No
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-35255. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-35227. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-35069. The following is the description of the original issue:

      Description of problem:

      Reviewing https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&arch=amd64&baseEndTime=2024-02-28%2023%3A59%3A59&baseRelease=4.15&baseStartTime=2024-02-01%2000%3A00%3A00&capability=operator-conditions&component=Cloud%20Compute%20%2F%20Other%20Provider&confidence=95&environment=ovn%20no-upgrade%20amd64%20azure%20standard&excludeArches=arm64%2Cheterogeneous%2Cppc64le%2Cs390x&excludeClouds=openstack%2Cibmcloud%2Clibvirt%2Covirt%2Cunknown&excludeVariants=hypershift%2Cosd%2Cmicroshift%2Ctechpreview%2Csingle-node%2Cassisted%2Ccompact&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&network=ovn&pity=5&platform=azure&platform=azure&sampleEndTime=2024-06-05%2023%3A59%3A59&sampleRelease=4.15&sampleStartTime=2024-05-30%2000%3A00%3A00&testId=Operator%20results%3A6d9ee55972f66121016367d07d52f0a9&testName=operator%20conditions%20control-plane-machine-set&upgrade=no-upgrade&upgrade=no-upgrade&variant=standard&variant=standard, it appears that the Azure tests are failing frequently with "Told to stop trying". Check failed before until passed.
      
      Reviewing this, it appears that the rollout happened as expected, but the until function got a non-retryable error and exited, while the check saw that the Deletion timestamp was set and the Machine went into Running, which caused it to fail.
      
      We should investigate why the until failed in this case as it should have seen the same machines and therefore should have seen a Running machine and passed.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

              joelspeed Joel Speed
              openshift-crt-jira-prow OpenShift Prow Bot
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: