Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-2304

[RFE] Enable network-check-source pod scheduling to nodes of master role or infra role

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Major Major
    • openshift-4.15
    • None
    • SDN
    • None
    • False
    • False

      1. Proposed title of this feature request
        • Enable network-check-source pod scheduling to nodes of master role or infra role
      2. What is the nature and description of the request?
        • Since OCP4.7 network connection health checks performed by controllers in openshift-network-diagnostics namespace.
        • However this is not working normally in OCP clusters having worker nodes in separated networks.
        • Because network-check-source pod deployed in any nodes randomly. So, if the pod is running on a network isolated worker node, podnetworkconnectivitycheck doesn't be created normally. the pod should be scheduled on a node that communicates to every node. But, users can't schedule network-check-source pod to a specific node.
        • If the pod run in a master node, this issue will be resolved.
        • Users can not schedule the network-check-source pod in openshift-network-diagnostics now because Cluster Network Operator manages the resource.[1]network-check-source
          https://github.com/openshift/cluster-network-operator/blob/master/bindata/network-diagnostics/network-check-source.yaml
      3. Why does the customer need this? (List the business requirements here)
        • Customer's OCP cluster having worker nodes in separated networks environment can't use openshift-network-diagnostics 
        • In the customer's production OCP cluster, users can't check entire cluster nodes' network status
      4. List any affected packages or components.
        • Cluster Network Operator
        • openshift-network-diagnostics namespace
        • network-check-source 
      5. How reproducible:

      #1. configure two worker group nodes in separated networks
      workerA.testocp.lab.com
      workerB.testocp.lab.com
      Two nodes can't connect to each other.

      #2. network-check-source pod is running on workerA node.
      $oc get pods -o wide |grep worker -n openshift-network-diagnostics
      network-check-source-644477f5f5-hwfbl 1/1 Running 0 47h 172.31.96.9 workerA.testocp.lab.com <none> <none>
      network-check-target-8wjvv 1/1 Running 0 17h 172.31.12.5 workerA.testocp.lab.com <none> <none>
      network-check-target-szd9g 1/1 Running 0 17h 172.31.93.3 workerB.testocp.lab.com <none> <none>

      $oc get svc
      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
      network-check-source ClusterIP None <none> 17698/TCP 57d
      network-check-target ClusterIP 172.30.140.241 <none> 80/TCP 57d

      #3. curl test from network-check-source pod to network-check-target pods
      $oc rsh network-check-source-644477f5f5-hwfbl  (this pod is running workerA node)

      sh-4.4$curl 172.31.12.5:8080 //request to workerA network-check-target is success
      Hello

      sh-4.4$curl 172.31.93.3:8080 //request to workerB network-check-target failed to establish a TCP connection to 172.31.93.3:8080: dial tcp 172.31.93.3:8080: connect: no route to host

      #4. $oc get podnetworkconnectivitycheck -n openshift-network-diagnostics
      NAME
      network-check-source-workerA-to-*
      ==> only network-check-source-workerA-to-* podnetworkconnectivitycheck was created
      network-check-source-workerB-to-* was not created

      #5. below event log keep being created per 6min
      $oc get event
      1m18s Normal ConnectivityRestored node/workerA.testocp.lab.com Connectivity restored after 7m0.706557921s: network-check-target-service-cluster: tcp connection to network-check-target:80 succeeded
      1m18s Warning ConnectivityOutageDetected node/workerA.testocp.lab.com Connectivity outage detected: network-check-target-service-cluster: failed to establish a TCP connection to network-check-target:80: dial tcp 172.30.140.241:80: connect: no route to host

            mcurry@redhat.com Marc Curry
            rhn-support-hyoskim Sophia Hyosun Kim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: