Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-1576

CI: fail update suite if any ClusterOperator go Available=False outside of updates

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None

      These are alarming conditions which may frighten customers, and we don't want to see them in our own, controlled, repeatable update CI. This example job had logs like:

      : [bz-Image Registry] clusteroperator/image-registry should not change condition/Available expand_less
      Run #0: Failed expand_less	1h58m17s
      {  0 unexpected clusteroperator state transitions during e2e test run, as desired.
      3 unwelcome but acceptable clusteroperator state transitions during e2e test run.  These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:
      
      Jan 09 12:43:04.348 E clusteroperator/image-registry condition/Available reason/NoReplicasAvailable status/False Available: The deployment does not have available replicas\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created (exception: We are not worried about Available=False or Degraded=True blips for stable-system tests yet.)
      Jan 09 12:43:04.348 - 56s   E clusteroperator/image-registry condition/Available reason/NoReplicasAvailable status/False Available: The deployment does not have available replicas\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created (exception: We are not worried about Available=False or Degraded=True blips for stable-system tests yet.)
      Jan 09 12:44:00.860 W clusteroperator/image-registry condition/Available reason/Ready status/True Available: The registry is ready\nNodeCADaemonAvailable: The daemon set node-ca has available replicas\nImagePrunerAvailable: Pruner CronJob has been created (exception: Available=True is the happy case)
      }
      

      And the job still passed.

      Definition of done:

      • Same as OTA-362, except filling in here.
      • File bugs or the existing issues
      • If bug exists then add the tests to the exception list.
      • Unless tests are in exception list , they should fail if we see Available != True.

            dperique@redhat.com Dennis Periquet
            lmohanty@redhat.com Lalatendu Mohanty
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: