Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56667

rosa-sts-ovn job blocking 4.20 nightly payloads due to sre-replace-packageserver-csv failing job

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 4.20.0
    • HyperShift / ROSA
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      For roughly three days, 4.20 nightlies are permafailing on this job:

      [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

        [
            {
              "metric": {
                "__name__": "ALERTS",
                "alertname": "KubeJobFailed",
                "alertstate": "firing",
                "condition": "true",
                "container": "kube-rbac-proxy-main",
                "endpoint": "https-main",
                "job": "kube-state-metrics",
                "job_name": "sre-replace-packageserver-csv",
                "namespace": "openshift-operator-lifecycle-manager",
                "prometheus": "openshift-monitoring/k8s",
                "service": "kube-state-metrics",
                "severity": "warning"
              },
              "value": [
                1748242812.353,
                "1"
              ]
            }
          ]
      

      Possible this could be related to a late change the OLM teams are trying to get into 4.19 to get their catalog versions updated?

      Example job failure: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-rosa-sts-ovn/1926847779737440256

      Once addressed, please check if there's a brittle SRE job here that could be improved to prevent in the future.

              rh-ee-avalija Amarthya Valija
              rhn-engineering-dgoodwin Devan Goodwin
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: