Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-1539

Loki outages should not fail tests

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False

      Yesterday a major DPCR and thus Loki outage took the system down entirely. One test would fail as a result:

      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.16-upgrade-from-stable-4.15-e2e-aws-ovn-upgrade/1762573177050894336

      [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

          [
            {
              "metric": {
                "__name__": "ALERTS",
                "alertname": "KubeDaemonSetRolloutStuck",
                "alertstate": "firing",
                "container": "kube-rbac-proxy-main",
                "daemonset": "loki-promtail",
                "endpoint": "https-main",
                "job": "kube-state-metrics",
                "namespace": "openshift-e2e-loki",
                "prometheus": "openshift-monitoring/k8s",
                "service": "kube-state-metrics",
                "severity": "warning"
              },
              "value": [
                1709071917.851,
                "1"
              ]
            }
          ]
      

      The query this test uses should be adapted to omit everything in openshift-e2e-loki.

      Ideally, backports would be good here, but we could just fix it going forward also if this is too cumbersome.

            rhn-engineering-dgoodwin Devan Goodwin
            rhn-engineering-dgoodwin Devan Goodwin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: