Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14
Component/s: HyperShift / Agent
Labels:
None

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

I have 3 Agent-provider hosted clusters installed, all are configured w/ a metallb-ingress, everything appears to be working as expected, I can access the hosted prom over ingress, etc. 

However the MCE cluster is reporting these alerts for all 3 hosted clusters:

100% of the ovnkube-control-plane/ovn-kubernetes-control-plane targets in Namespace clusters-test[1-3] namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.

Is it possible this is a lingering alert state that was not cleared after firing during the initial part of the hosted cluster creation, before the metallb-ingress was created? 
I am not finding any errors in the pod states:

clusters-test1                                     ovnkube-control-plane-76b46b8b94-2kmlf                            3/3     Running     0              6d2h
clusters-test1                                     ovnkube-control-plane-76b46b8b94-b4p7r                            3/3     Running     0              6d2h

Must gathers are here (behind VPN): http://perfscale.perf.eng.bos2.dc.redhat.com/pub/jhopper/OCP4/debug/agent_hcp_mg/

Version-Release number of selected component (if applicable):

OCP 4.14.21
CNV 4.14.6
multicluster-engine.v2.5.3
metallb-operator.v4.14.0-202406111908

How reproducible:

Alert fires for all 3 hosted clusters

Steps to Reproduce:

1. Install baremetal agent-provider Hosted Clusters as per docs
2. Configure ingress after initial HC creation stage
3. Check alerts on mgmt/hub cluster

Actual results:

OVN alerts

Expected results:

No alert if no CO or OVN pod shows errors

Additional info:

ex:
Annotations:          hypershift.openshift.io/release-image: quay.io/openshift-release-dev/ocp-release:4.14.21-multi
                      k8s.ovn.org/pod-networks:
                        {"default":{"ip_addresses":["10.130.0.208/23"],"mac_address":"0a:58:0a:82:00:d0","gateway_ips":["10.130.0.1"],"routes":[{"dest":"10.128.0....
                      k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "ovn-kubernetes",
                            "interface": "eth0",
                            "ips": [
                                "10.130.0.208"
                            ],
                            "mac": "0a:58:0a:82:00:d0",
                            "default": true,
                            "dns": {}
                        }]
                      networkoperator.openshift.io/cluster-network-cidr: 10.132.0.0/14
                      networkoperator.openshift.io/hybrid-overlay-status: disabled
                      networkoperator.openshift.io/ip-family-mode: single-stack
                      openshift.io/scc: restricted-v2
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
SeccompProfile:       RuntimeDefault
IP:                   10.130.0.208
IPs:
  IP:           10.130.0.208
Controlled By:  ReplicaSet/ovnkube-control-plane-76b46b8b94


Inside the hosted cluster all OVN pods look ok:
openshift-ovn-kubernetes                           ovnkube-node-67v22                                                8/8     Running     1 (6d2h ago)   6d2h
openshift-ovn-kubernetes                           ovnkube-node-6mqdp                                                8/8     Running     0              6d2h
openshift-ovn-kubernetes                           ovnkube-node-stpxp                                                8/8     Running     9 (6d1h ago)   6d2h
openshift-ovn-kubernetes                           ovnkube-node-wlplr                                                8/8     Running     0              6d2h


All HC CO's are available. Note the alert continues to fire even after a successful minor version (.20->.21) upgrade of both the MCE cluster & Hosted Clusters. 

NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.21   True        False         False      6d6h
baremetal                                  4.14.21   True        False         False      66d
cloud-controller-manager                   4.14.21   True        False         False      66d
cloud-credential                           4.14.21   True        False         False      66d
cluster-autoscaler                         4.14.21   True        False         False      66d
config-operator                            4.14.21   True        False         False      66d
console                                    4.14.21   True        False         False      22d
control-plane-machine-set                  4.14.21   True        False         False      66d
csi-snapshot-controller                    4.14.21   True        False         False      66d
dns                                        4.14.21   True        False         False      66d
etcd                                       4.14.21   True        False         False      66d
image-registry                             4.14.21   True        False         False      66d
ingress                                    4.14.21   True        False         False      66d
insights                                   4.14.21   True        False         False      34d
kube-apiserver                             4.14.21   True        False         False      66d
kube-controller-manager                    4.14.21   True        False         False      66d
kube-scheduler                             4.14.21   True        False         False      66d
kube-storage-version-migrator              4.14.21   True        False         False      6d6h
machine-api                                4.14.21   True        False         False      66d
machine-approver                           4.14.21   True        False         False      66d
machine-config                             4.14.21   True        False         False      34d
marketplace                                4.14.21   True        False         False      66d
monitoring                                 4.14.21   True        False         False      66d
network                                    4.14.21   True        False         False      66d
node-tuning                                4.14.21   True        False         False      6d7h
openshift-apiserver                        4.14.21   True        False         False      37d
openshift-controller-manager               4.14.21   True        False         False      66d
openshift-samples                          4.14.21   True        False         False      6d7h
operator-lifecycle-manager                 4.14.21   True        False         False      66d
operator-lifecycle-manager-catalog         4.14.21   True        False         False      66d
operator-lifecycle-manager-packageserver   4.14.21   True        False         False      35d
service-ca                                 4.14.21   True        False         False      66d
storage                                    4.14.21   True        False         False      66d

# oc --kubeconfig kubeconfig-test1 describe svc -n openshift-ingress    metallb-ingress
Name:                     metallb-ingress
Namespace:                openshift-ingress
Labels:                   <none>
Annotations:              metallb.universe.tf/address-pool: ingress-ip
                          metallb.universe.tf/ip-allocated-from-pool: ingress-ip
Selector:                 ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.31.135.76
IPs:                      172.31.135.76
LoadBalancer Ingress:     198.18.10.11
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  30200/TCP
Endpoints:                198.18.10.32:80
Port:                     https  443/TCP
TargetPort:               443/TCP
NodePort:                 https  30136/TCP
Endpoints:                198.18.10.32:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason        Age                 From             Message
  ----    ------        ----                ----             -------
  Normal  nodeAssigned  123m (x6 over 20h)  metallb-speaker  announcing from node "worker02" with protocol "layer2"

Assignee:: Crystal Chun

Reporter:: Jenifer Abrams

QA Contact:: Lubov Shilin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/06/24 1:23 PM

Updated:: 2024/09/30 3:06 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates