-
Feature Request
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
False
-
-
-
-
- Proposed title of this feature request
- Enable network-check-source pod scheduling to nodes of master role or infra role
- What is the nature and description of the request?
- Since OCP4.7 network connection health checks performed by controllers in openshift-network-diagnostics namespace.
- However this is not working normally in OCP clusters having worker nodes in separated networks.
- Because network-check-source pod deployed in any nodes randomly. So, if the pod is running on a network isolated worker node, podnetworkconnectivitycheck doesn't be created normally. the pod should be scheduled on a node that communicates to every node. But, users can't schedule network-check-source pod to a specific node.
- If the pod run in a master node, this issue will be resolved.
- Users can not schedule the network-check-source pod in openshift-network-diagnostics now because Cluster Network Operator manages the resource.[1]network-check-source
https://github.com/openshift/cluster-network-operator/blob/master/bindata/network-diagnostics/network-check-source.yaml
- Why does the customer need this? (List the business requirements here)
- Customer's OCP cluster having worker nodes in separated networks environment can't use openshift-network-diagnostics
- In the customer's production OCP cluster, users can't check entire cluster nodes' network status
- List any affected packages or components.
- Cluster Network Operator
- openshift-network-diagnostics namespace
- network-check-source
- How reproducible:
#1. configure two worker group nodes in separated networks
workerA.testocp.lab.com
workerB.testocp.lab.com
Two nodes can't connect to each other.
#2. network-check-source pod is running on workerA node.
$oc get pods -o wide |grep worker -n openshift-network-diagnostics
network-check-source-644477f5f5-hwfbl 1/1 Running 0 47h 172.31.96.9 workerA.testocp.lab.com <none> <none>
network-check-target-8wjvv 1/1 Running 0 17h 172.31.12.5 workerA.testocp.lab.com <none> <none>
network-check-target-szd9g 1/1 Running 0 17h 172.31.93.3 workerB.testocp.lab.com <none> <none>
$oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
network-check-source ClusterIP None <none> 17698/TCP 57d
network-check-target ClusterIP 172.30.140.241 <none> 80/TCP 57d
#3. curl test from network-check-source pod to network-check-target pods
$oc rsh network-check-source-644477f5f5-hwfbl (this pod is running workerA node)
sh-4.4$curl 172.31.12.5:8080 //request to workerA network-check-target is success
Hello
sh-4.4$curl 172.31.93.3:8080 //request to workerB network-check-target failed to establish a TCP connection to 172.31.93.3:8080: dial tcp 172.31.93.3:8080: connect: no route to host
#4. $oc get podnetworkconnectivitycheck -n openshift-network-diagnostics
NAME
network-check-source-workerA-to-*
==> only network-check-source-workerA-to-* podnetworkconnectivitycheck was created
network-check-source-workerB-to-* was not created
#5. below event log keep being created per 6min
$oc get event
1m18s Normal ConnectivityRestored node/workerA.testocp.lab.com Connectivity restored after 7m0.706557921s: network-check-target-service-cluster: tcp connection to network-check-target:80 succeeded
1m18s Warning ConnectivityOutageDetected node/workerA.testocp.lab.com Connectivity outage detected: network-check-target-service-cluster: failed to establish a TCP connection to network-check-target:80: dial tcp 172.30.140.241:80: connect: no route to host