-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
4.14
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
I have 3 Agent-provider hosted clusters installed, all are configured w/ a metallb-ingress, everything appears to be working as expected, I can access the hosted prom over ingress, etc. However the MCE cluster is reporting these alerts for all 3 hosted clusters: 100% of the ovnkube-control-plane/ovn-kubernetes-control-plane targets in Namespace clusters-test[1-3] namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support. Is it possible this is a lingering alert state that was not cleared after firing during the initial part of the hosted cluster creation, before the metallb-ingress was created? I am not finding any errors in the pod states: clusters-test1 ovnkube-control-plane-76b46b8b94-2kmlf 3/3 Running 0 6d2h clusters-test1 ovnkube-control-plane-76b46b8b94-b4p7r 3/3 Running 0 6d2h
Must gathers are here (behind VPN): http://perfscale.perf.eng.bos2.dc.redhat.com/pub/jhopper/OCP4/debug/agent_hcp_mg/
Version-Release number of selected component (if applicable):
OCP 4.14.21 CNV 4.14.6 multicluster-engine.v2.5.3 metallb-operator.v4.14.0-202406111908
How reproducible:
Alert fires for all 3 hosted clusters
Steps to Reproduce:
1. Install baremetal agent-provider Hosted Clusters as per docs 2. Configure ingress after initial HC creation stage 3. Check alerts on mgmt/hub cluster
Actual results:
OVN alerts
Expected results:
No alert if no CO or OVN pod shows errors
Additional info:
ex:
Annotations: hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.14.21-multi
k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.130.0.208/23"],"mac_address":"0a:58:0a:82:00:d0","gateway_ips":["10.130.0.1"],"routes":[{"dest":"10.128.0....
k8s.v1.cni.cncf.io/network-status:
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.130.0.208"
],
"mac": "0a:58:0a:82:00:d0",
"default": true,
"dns": {}
}]
networkoperator.openshift.io/cluster-network-cidr: 10.132.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.130.0.208
IPs:
IP: 10.130.0.208
Controlled By: ReplicaSet/ovnkube-control-plane-76b46b8b94
Inside the hosted cluster all OVN pods look ok:
openshift-ovn-kubernetes ovnkube-node-67v22 8/8 Running 1 (6d2h ago) 6d2h
openshift-ovn-kubernetes ovnkube-node-6mqdp 8/8 Running 0 6d2h
openshift-ovn-kubernetes ovnkube-node-stpxp 8/8 Running 9 (6d1h ago) 6d2h
openshift-ovn-kubernetes ovnkube-node-wlplr 8/8 Running 0 6d2h
All HC CO's are available. Note the alert continues to fire even after a successful minor version (.20->.21) upgrade of both the MCE cluster & Hosted Clusters.
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.14.21 True False False 6d6h
baremetal 4.14.21 True False False 66d
cloud-controller-manager 4.14.21 True False False 66d
cloud-credential 4.14.21 True False False 66d
cluster-autoscaler 4.14.21 True False False 66d
config-operator 4.14.21 True False False 66d
console 4.14.21 True False False 22d
control-plane-machine-set 4.14.21 True False False 66d
csi-snapshot-controller 4.14.21 True False False 66d
dns 4.14.21 True False False 66d
etcd 4.14.21 True False False 66d
image-registry 4.14.21 True False False 66d
ingress 4.14.21 True False False 66d
insights 4.14.21 True False False 34d
kube-apiserver 4.14.21 True False False 66d
kube-controller-manager 4.14.21 True False False 66d
kube-scheduler 4.14.21 True False False 66d
kube-storage-version-migrator 4.14.21 True False False 6d6h
machine-api 4.14.21 True False False 66d
machine-approver 4.14.21 True False False 66d
machine-config 4.14.21 True False False 34d
marketplace 4.14.21 True False False 66d
monitoring 4.14.21 True False False 66d
network 4.14.21 True False False 66d
node-tuning 4.14.21 True False False 6d7h
openshift-apiserver 4.14.21 True False False 37d
openshift-controller-manager 4.14.21 True False False 66d
openshift-samples 4.14.21 True False False 6d7h
operator-lifecycle-manager 4.14.21 True False False 66d
operator-lifecycle-manager-catalog 4.14.21 True False False 66d
operator-lifecycle-manager-packageserver 4.14.21 True False False 35d
service-ca 4.14.21 True False False 66d
storage 4.14.21 True False False 66d
# oc --kubeconfig kubeconfig-test1 describe svc -n openshift-ingress metallb-ingress
Name: metallb-ingress
Namespace: openshift-ingress
Labels: <none>
Annotations: metallb.universe.tf/address-pool: ingress-ip
metallb.universe.tf/ip-allocated-from-pool: ingress-ip
Selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.31.135.76
IPs: 172.31.135.76
LoadBalancer Ingress: 198.18.10.11
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 30200/TCP
Endpoints: 198.18.10.32:80
Port: https 443/TCP
TargetPort: 443/TCP
NodePort: https 30136/TCP
Endpoints: 198.18.10.32:443
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 123m (x6 over 20h) metallb-speaker announcing from node "worker02" with protocol "layer2"