Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38614

Hypershift - missing ServiceMonitor for OVN

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • Hypershift Sprint 259, Hypershift Sprint 260, Hypershift Sprint 263
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      • Are we aware of a misfiring alert of 'NoRunningOvnControlPlane' due to no ServiceMonitor for ovn on the hosted cluster?

      Aka we're missing a ServiceMonitor for OVN

      oc get servicemonitor -n clusters-yonah
      NAME                                  AGE
      catalog-operator                      6d23h
      cluster-version-operator              6d23h
      etcd                                  6d23h
      kube-apiserver                        6d23h
      kube-controller-manager               6d23h
      monitor-multus-admission-controller   6d23h
      monitor-ovn-control-plane-metrics     6d23h
      node-tuning-operator                  6d23h
      olm-operator                          6d23h
      openshift-apiserver                   6d23h
      openshift-controller-manager          6d23h
      openshift-route-controller-manager    6d23h 

       

      oc get prometheusrule master-rules -o yaml -n clusters-yonah | grep NoRunningOvnControlPlane -A 14
          - alert: NoRunningOvnControlPlane
            annotations:
              description: |
                Networking control plane is degraded. Networking configuration updates applied to the cluster will not be
                implemented while there are no OVN Kubernetes pods.
              runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/NoRunningOvnMaster.md
              summary: There is no running ovn-kubernetes control plane.
            expr: |
              absent(up{job="ovnkube-control-plane", namespace="clusters-yonah"} == 1)
            for: 5m
            labels:
              namespace: clusters-yonah
              severity: critical
          - alert: NoOvnClusterManagerLeader
            annotations:
      iwatson@iwatson-mac ~ % oc get prometheusrule master-rules -o yaml | grep NoRunningOvnControlPlane -A 12
          - alert: NoRunningOvnControlPlane
            annotations:
              description: |
                Networking control plane is degraded. Networking configuration updates applied to the cluster will not be
                implemented while there are no OVN Kubernetes pods.
              runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/NoRunningOvnMaster.md
              summary: There is no running ovn-kubernetes control plane.
            expr: |
              absent(up{job="ovnkube-control-plane", namespace="clusters-yonah"} == 1)
            for: 5m
            labels:
              namespace: clusters-yonah
              severity: critical 

      This is causing the alert NoRunningOvnControlPlane to fire despite the OVN control plane being available. 

      oc get pods -n clusters-yonah | grep ovn
      ovnkube-control-plane-58659f5cd9-2sc2m                3/3     Running     1 (16h ago)     6d23h
      ovnkube-control-plane-58659f5cd9-57q24                3/3     Running     0               6d23h 

              rh-ee-bclement Borja Clemente Castanera
              rhn-support-mrobson Matt Robson
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: