Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37645

Cannot reach to kubernetes.default.svc.cluster.local from workers of Hosted Cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 4.16.z
    • 4.15.z
    • HyperShift
    • None
    • Critical
    • None
    • Hypershift Sprint 257
    • 1
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem

      Debug into one of the worker nodes on the hosted cluster:

      oc debug node/ip-10-1-0-97.ca-central-1.compute.internal
      
      nslookup kubernetes.default.svc.cluster.local
      Server:         10.1.0.2
      Address:        10.1.0.2#53
      
      ** server can't find kubernetes.default.svc.cluster.local: NXDOMAIN
      
      curl -k https://172.30.0.1:443/readyz
      curl: (7) Failed to connect to 172.30.0.1 port 443: Connection refused
      
      sh-5.1# curl -k https://172.20.0.1:443/readyz
      ok

      Version-Release number of selected component (if applicable):

      4.15.20

      Steps to Reproduce:

      Unknown

      Actual results:

      Pods on a hosted cluster's workers unable to connect to their internal kube apiserver via the service IP.

      Expected results:

      Pods on a hosted cluster's workers have connectivity to their kube apiserver via the service IP.

      Additional info:

      Checked the "Konnectivity server" logs on Dynatrace and found the error below occurs repeatedly

      E0724 01:02:00.223151       1 server.go:895] "DIAL_RSP contains failure" err="dial tcp 172.30.176.80:8443: i/o timeout" dialID=8375732890105363305 agentID="1eab211f-6ea1-46ea-bc78-14d75d6ba325"
      
      E0724 01:02:00.223482       1 tunnel.go:150] "Received failure on connection" err="read tcp 10.128.17.15:8090->10.128.82.107:52462: use of closed network connection" 
      • Looks the konnectivity server is trying to establish a connection to 172.30.176.80:8443 but is timing out
      • also the 2nd error indicates that an existing network connection was closed unexpectedly

      Relevant OHSS Ticket: https://issues.redhat.com/browse/OHSS-36053

      Slack thread discussion

              agarcial@redhat.com Alberto Garcia Lamela
              sbai@redhat.com Shawn Bai (Inactive)
              He Liu He Liu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: