Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44970

Loki on SNO throws excessive restarts while waiting for DNS deployment

XMLWordPrintable

    • Moderate
    • None
    • 0
    • OCPEDGE Sprint 264
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Context Thread

       As a maintainer of the SNO CI lane, I would like to ensure that the following test doesn't failure regularly as part of SNO CI.

      [sig-architecture] platform pods in ns/openshift-e2e-loki should not exit an excessive amount of times
      

      This issue is a symptom of a greater problem with SNO where there is downtime in resolving DNS after the upgrade reboot where the DNS operator has an outage while its deploying the new DNS pods. During that time, loki exists after hitting the following error:

      2024/10/23 07:21:32 OIDC provider initialization failed: Get "https://sso.redhat.com/auth/realms/redhat-external/.well-known/openid-configuration": dial tcp: lookup sso.redhat.com on 172.30.0.10:53: read udp 10.128.0.4:53104->172.30.0.10:53: read: connection refused
      

      This issue is important because it can contribute to payload rejection in our blocking CI jobs.

      Acceptance Criteria:

      • Problem is discussed with the networking team to understand the best path to resolution and decision is documented
      • Either the DNS operator or test are adjusted to address or mitigate the issue.
      • CI is free from the issue in test results for an extended period. (Need to confirm how often we're seeing it first before this period can be defined with confidence).

              jpoulin Jeremy Poulin
              jpoulin Jeremy Poulin
              Neil Hamza Neil Hamza
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: