-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
---
-
-
-
0
-
0
Following the original OCP bug https://issues.redhat.com/browse/OCPBUGS-41844 the workaround was found, but it doesn't work out on our bm cluster.
The problem is visible on OpenShift Data Foundation, Provider - Client setups, on HCP kubevirt Client clusters. OCP 4.16 and 4.17. ODF 4.16 and 4.17
Original issue summary:
StorageClient fails to connect to a NodePort service on the hosted node's corresponding bare-metal node in an HCP environment. The issue is intermittent, and the connection only succeeds when the StorageClient pod runs on a different hosted node or connects to a different NodePort IP.
These two comments lead us to open this bug:
If the virt-launcher networkpolicy is edited with the following (after put hypershift replicas to 0) everything works fine
- namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-storage podSelector: matchLabels: app: ocsProviderApiServer
So what is really a bug is ovn-kubernetes allow acess to the other nodes.
So following bash script works upstream but do not work at your baremetal cluster, please open a bug to SDN team ovn-kubernetes
#!/bin/bash -xe export KUBECONFIG=${KUBECONFIG:-KUBECONFIG=~/Documents/cnv/sandbox/kubeconfig} oc apply -f - <<EOF — apiVersion: v1 kind: Namespace metadata: name: foo — apiVersion: v1 kind: Namespace metadata: name: bar — apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: namespace: foo name: alpine spec: egress: - to: - ipBlock: cidr: 0.0.0.0/0 except: - 10.128.0.0/14 - 172.30.0.0/16 - ipBlock: cidr: ::/0 ingress: - ports: - protocol: TCP podSelector: matchLabels: name: alpine policyTypes: - Ingress - Egress — apiVersion: v1 kind: Pod metadata: namespace: foo name: alpine labels: name: alpine spec: containers: - image: alpine:3.2 command: - /bin/sh - "-c" - "sleep 60m" imagePullPolicy: IfNotPresent name: alpine restartPolicy: Always — apiVersion: v1 kind: Pod metadata: namespace: bar name: nginx labels: app.kubernetes.io/name: proxy spec: containers: - name: nginx image: nginx:stable ports: - containerPort: 80 name: http-web-svc — apiVersion: v1 kind: Service metadata: namespace: bar name: nginx-service spec: type: NodePort selector: app.kubernetes.io/name: proxy ports: - name: name-of-service-port protocol: TCP port: 80 targetPort: http-web-svc EOF node_port=$(oc get svc -n bar nginx-service -o json |jq .spec.ports[0].nodePort)for node_ip in $(oc get node -o wide --no-headers |awk '{print $6}'); do oc exec -it -n foo alpine – wget http://$node_ip:$node_port -O index.html done
Slack discussion and logs:
https://ibm-systems-storage.slack.com/archives/C06EPQRBM36/p1726047883407299
https://redhat.enterprise.slack.com/archives/C019X3PEF2B/p1726051047584979?thread_ts=1726051047.584979&cid=C019X3PEF2B
https://redhat.enterprise.slack.com/archives/C02UVQRJG83/p1726055560238719?thread_ts=1726055560.238719&cid=C02UVQRJG83
Related bugs:
Client heartbeat missing on provider after upgrading to 4.17 - https://bugzilla.redhat.com/show_bug.cgi?id=2311357
[Provider mode] StorageClient connection fails. Failed to create a new provider client: failed to dial. -
https://bugzilla.redhat.com/show_bug.cgi?id=2281536
HyperShift dump - https://drive.google.com/file/d/1NCFB2f2kOifgOiNsOFmkaJJysuSwtxG_/view?usp=sharing
OCP mg - https://drive.google.com/file/d/1oy8Jy_v849UPzm6L6Z6jjKwRTwtuG7JH/view?usp=sharing
OCS mg - https://drive.google.com/file/d/1kc-eIsSfi5QF8yY2RsDcpGRp5lh4NgQu/view?usp=sharing