Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42783

HCP unable to pull images from registries only accessible from worker nodes

XMLWordPrintable

    • None
    • Hypershift Sprint 260
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, when you tried to import an image in a hosted cluster where only the workers had access to the image registry, the `oc import-image` command failed because the registry could not be reached. With this release, the proxy that is used by the OpenShift API server in the management cluster is configured to resolve names by using the data plane. As a result, the `oc import-image` command works with the private registry. (link:https://issues.redhat.com/browse/OCPBUGS-42783[*OCPBUGS-42783*])
      Show
      * Previously, when you tried to import an image in a hosted cluster where only the workers had access to the image registry, the `oc import-image` command failed because the registry could not be reached. With this release, the proxy that is used by the OpenShift API server in the management cluster is configured to resolve names by using the data plane. As a result, the `oc import-image` command works with the private registry. (link: https://issues.redhat.com/browse/OCPBUGS-42783 [* OCPBUGS-42783 *])
    • Bug Fix
    • Done

      Context
      Some ROSA HCP users host their own container registries (e.g., self-hosted Quay servers) that are only accessible from inside of their VPCs. This is often achieved through the use of private DNS zones that resolve non-public domains like quay.mycompany.intranet to non-public IP addresses. The private registries at those addresses then present self-signed SSL certificates to the client that can be validated against the HCP's additional CA trust bundle.

      Problem Description
      A user of a ROSA HCP cluster with a configuration like the one described above is encountering errors when attempting to import a container image from their private registry into their HCP's internal registry via oc import-image. Originally, these errors showed up in openshift-apiserver logs as DNS resolution errors, i.e., OCPBUGS-36944. After the user upgraded their cluster to 4.14.37 (which fixes OCPBUGS-36944), openshift-apiserver was able to properly resolve the domain name but complains of HTTP 502 Bad Gateway errors. We suspect these 502 Bad Gateway errors are coming from the Konnectivity-agent while it proxies traffic between the control and data planes.

      We've confirmed that the private registry is accessible from the HCP data plane (worker nodes) and that the certificate presented by the registry can be validated against the cluster's additional trust bundle. IOW, curl-ing the private registry from a worker node returns a HTTP 200 OK, but doing the same from a control plane node returns a HTTP 502. Notably, this cluster is not configured with a cluster-wide proxy, nor does the user's VPC feature a transparent proxy.

      Version-Release number of selected component
      OCP v4.14.37

      How reproducible
      Can be reliably reproduced, although the network config (see Context above) is quite specific

      Steps to Reproduce

      1. Run the following command from the HCP data plane
        oc import-image imagegroup/imagename:v1.2.3 --from=quay.mycompany.intranet/imagegroup/imagename:v1.2.3 --confirm
        
      2. Observe the command output, the resulting ImageStream object, and openshift-apiserver logs

      Actual Results

      error: tag v1.2.3 failed: Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
      imagestream.image.openshift.io/imagename imported with errors
      
      Name:            imagename
      Namespace:        mynamespace
      Created:        Less than a second ago
      Labels:            <none>
      Annotations:        openshift.io/image.dockerRepositoryCheck=2024-10-01T12:46:02Z
      Image Repository:    default-route-openshift-image-registry.apps.rosa.clustername.abcd.p1.openshiftapps.com/mynamespace/imagename
      Image Lookup:        local=false
      Unique Images:        0
      Tags:            1
      
      v1.2.3
        tagged from quay.mycompany.intranet/imagegroup/imagename:v1.2.3
      
        ! error: Import failed (InternalError): Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
            Less than a second ago
      
      error: imported completed with errors
      

      Expected Results
      Desired container image is imported from private external image registry into cluster's internal image registry without error

              cewong@redhat.com Cesar Wong
              abyrne.openshift Anthony Byrne
              Jie Zhao Jie Zhao
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: