Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-797

watcher: investigate disruption increase on azure

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Minor Minor
    • None
    • None
    • False
    • None
    • False

      In looking at this 4.13 nightly payload,

      this azure aggr job failed miserably.

      On one of the jobs, I saw:

      : [bz-Networking][invariant] alert/KubePodNotReady should not be at or above info in ns/openshift-multus expand_less 0s
      
      {  KubePodNotReady was at or above info for at least 10s on platformidentification.JobType{Release:"4.13", FromRelease:"", Platform:"azure", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 0s, firing for 10s:
      
      Jan 25 05:24:36.090 - 10s   W alert/KubePodNotReady ns/openshift-multus pod/network-metrics-daemon-95w7p ALERTS{alertname="KubePodNotReady", alertstate="firing", namespace="openshift-multus", pod="network-metrics-daemon-95w7p", prometheus="openshift-monitoring/k8s", severity="warning"}}
      
      

      That did not fail due to image pull backoff as it did in TRT-796.

      I saw a lot of:

      "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?" pod="openshift-multus/network-metrics-daemon-95w7p" podUID=b9571c8c-9c12-41a8-b13f-dc090d54c771
      
      

              dperique@redhat.com Dennis Periquet
              dperique@redhat.com Dennis Periquet
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: