Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43083

Pods cannot connect to apiserver in IPv6 disconnected hosted cluster

XMLWordPrintable

    • Critical
    • None
    • Hypershift Sprint 261, Hypershift Sprint 262, Hypershift Sprint 263
    • 3
    • False
    • Hide

      None

      Show
      None
    • Hide
      *Cause*: The addressed passed to the Kubernetes endpointSlice on the cluster are not properly set.
      *Consequence*: The IPv6 disconnected worker nodes cannot connect the KAS
      *Fix*: Modified the way we evaluate the address provided to the slice
      *Result*: Bug doesn’t present anymore.
      Show
      *Cause*: The addressed passed to the Kubernetes endpointSlice on the cluster are not properly set. *Consequence*: The IPv6 disconnected worker nodes cannot connect the KAS *Fix*: Modified the way we evaluate the address provided to the slice *Result*: Bug doesn’t present anymore.
    • Bug Fix
    • In Progress

      Description of problem:

      Installing 4.17 agent-based hosted cluster on bare-metal with IPv6 stack in disconnected environment. We cannot install MetalLB operator on the hosted cluster to expose openshift router and handle ingress because the openshift-marketplace pods that extract the operator bundle and the relative pods are in Error state. They try to execute the following command but cannot reach the cluster apiserver:
      
      opm alpha bundle extract -m /bundle/ -n openshift-marketplace -c b5a818607a7a162d7f9a13695046d44e47d8127a45cad69c0d8271b2da945b1 -z
      
      INFO[0000] Using in-cluster kube client config          
      Error: error loading manifests from directory: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-marketplace/configmaps/b5a818607a7a162d7f9a13695046d44e47d8127a45cad69c0d8271b2da945b1": dial tcp [fd02::1]:443: connect: connection refused
      
      
      
      In our hosted cluster fd02::1 is the clusterIP of the kubernetes service and the endpoint associated to the service is [fd00::1]:6443. By debugging the pods we see that connection to clusterIP is refused but if we try to connect to its endpoint the connection is established and we get 403 Forbidden:
      
      sh-5.1$ curl -k https://[fd02::1]:443
      curl: (7) Failed to connect to fd02::1 port 443: Connection refused
      
      
      sh-5.1$ curl -k https://[fd00::1]:6443
      {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
        "reason": "Forbidden",
        "details": {},
        "code": 403
      
      This issue is happening also in other pods in the hosted cluster which are in Error or in CrashLoopBackOff, we have similar error in their logs, e.g.:
      
      F1011 09:11:54.129077       1 cmd.go:162] failed checking apiserver connectivity: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-service-ca-operator/leases/service-ca-operator-lock": dial tcp [fd02::1]:443: connect: connection refused
      
      
      IPv6 disconnected 4.16 hosted cluster with same configuration was installed successfully and didn't show this issue, and neither IPv4 disconnected 4.17. So the issue is with IPv6 stack only.

      Version-Release number of selected component (if applicable):

      Hub cluster: 4.17.0-0.nightly-2024-10-10-004834
      
      MCE 2.7.0-DOWNANDBACK-2024-09-27-14-52-56
      
      Hosted cluster: version 4.17.1
      image: registry.ci.openshift.org/ocp/release@sha256:e16ac60ac6971e5b6f89c1d818f5ae711c0d63ad6a6a26ffe795c738e8cc4dde

      How reproducible:

      100%

      Steps to Reproduce:

          1. Install MCE 2.7 on 4.17 IPv6 disconnected BM hub cluster
          2. Install 4.17 agent-based hosted cluster and scale up the nodepool 
          3. After worker nodes are installed, attempt to install MetalLB operator to hanlde ingress
          

      Actual results:

      MetalLB operator cannot be installed because pods cannot connect to the cluster apiserver.

      Expected results:

      Pods in the cluster can connect to apiserver. 

      Additional info:

       

       

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              epassaro@redhat.com Elsa Passaro
              Liangquan Li Liangquan Li
              Lubov Shilin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: