Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46665

Pods cannot connect to apiserver in IPv6 disconnected hosted cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • 4.18.0
    • 4.15, 4.16, 4.17, 4.18
    • HyperShift
    • Critical
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, incorrect addresses were being passed to the Kubernetes EndpointSlice on a cluster, and this issue prevented the installation of the the MetalLB Operator on an Agent-based cluster in an IPv6 disconnected environment. With this release, a fix modifies the address evaluation method. Red{nbsp} Hat Marketplace pods can now successfully connect to the cluster API server, so that the installation of MetalLB Operator and handling of ingress traffic in IPv6 disconnected environments can occur. (link:https://issues.redhat.com/browse/OCPBUGS-46665[*OCPBUGS-46665*])
      Show
      * Previously, incorrect addresses were being passed to the Kubernetes EndpointSlice on a cluster, and this issue prevented the installation of the the MetalLB Operator on an Agent-based cluster in an IPv6 disconnected environment. With this release, a fix modifies the address evaluation method. Red{nbsp} Hat Marketplace pods can now successfully connect to the cluster API server, so that the installation of MetalLB Operator and handling of ingress traffic in IPv6 disconnected environments can occur. (link: https://issues.redhat.com/browse/OCPBUGS-46665 [* OCPBUGS-46665 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-43083. The following is the description of the original issue:

      Description of problem:

      Installing 4.17 agent-based hosted cluster on bare-metal with IPv6 stack in disconnected environment. We cannot install MetalLB operator on the hosted cluster to expose openshift router and handle ingress because the openshift-marketplace pods that extract the operator bundle and the relative pods are in Error state. They try to execute the following command but cannot reach the cluster apiserver:
      
      opm alpha bundle extract -m /bundle/ -n openshift-marketplace -c b5a818607a7a162d7f9a13695046d44e47d8127a45cad69c0d8271b2da945b1 -z
      
      INFO[0000] Using in-cluster kube client config          
      Error: error loading manifests from directory: Get "https://[fd02::1]:443/api/v1/namespaces/openshift-marketplace/configmaps/b5a818607a7a162d7f9a13695046d44e47d8127a45cad69c0d8271b2da945b1": dial tcp [fd02::1]:443: connect: connection refused
      
      
      
      In our hosted cluster fd02::1 is the clusterIP of the kubernetes service and the endpoint associated to the service is [fd00::1]:6443. By debugging the pods we see that connection to clusterIP is refused but if we try to connect to its endpoint the connection is established and we get 403 Forbidden:
      
      sh-5.1$ curl -k https://[fd02::1]:443
      curl: (7) Failed to connect to fd02::1 port 443: Connection refused
      
      
      sh-5.1$ curl -k https://[fd00::1]:6443
      {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
        "reason": "Forbidden",
        "details": {},
        "code": 403
      
      This issue is happening also in other pods in the hosted cluster which are in Error or in CrashLoopBackOff, we have similar error in their logs, e.g.:
      
      F1011 09:11:54.129077       1 cmd.go:162] failed checking apiserver connectivity: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-service-ca-operator/leases/service-ca-operator-lock": dial tcp [fd02::1]:443: connect: connection refused
      
      
      IPv6 disconnected 4.16 hosted cluster with same configuration was installed successfully and didn't show this issue, and neither IPv4 disconnected 4.17. So the issue is with IPv6 stack only.

      Version-Release number of selected component (if applicable):

      Hub cluster: 4.17.0-0.nightly-2024-10-10-004834
      
      MCE 2.7.0-DOWNANDBACK-2024-09-27-14-52-56
      
      Hosted cluster: version 4.17.1
      image: registry.ci.openshift.org/ocp/release@sha256:e16ac60ac6971e5b6f89c1d818f5ae711c0d63ad6a6a26ffe795c738e8cc4dde

      How reproducible:

      100%

      Steps to Reproduce:

          1. Install MCE 2.7 on 4.17 IPv6 disconnected BM hub cluster
          2. Install 4.17 agent-based hosted cluster and scale up the nodepool 
          3. After worker nodes are installed, attempt to install MetalLB operator to hanlde ingress
          

      Actual results:

      MetalLB operator cannot be installed because pods cannot connect to the cluster apiserver.

      Expected results:

      Pods in the cluster can connect to apiserver. 

      Additional info:

       

       

              jparrill@redhat.com Juan Manuel Parrilla Madrid
              openshift-crt-jira-prow OpenShift Prow Bot
              Liangquan Li Liangquan Li
              Darragh Fitzmaurice Darragh Fitzmaurice
              Lubov Shilin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: