Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42783

HCP unable to pull images from registries only accessible from worker nodes

XMLWordPrintable

    • None
    • Hypershift Sprint 260
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      *cause* - A customer is trying to import an image in a hosted cluster in which only the workers have access to the image registry.
      *consequence* - The `oc import-image` command fails because the registry cannot be reached.
      *fix* - The proxy used by the openshift-apiserver in the management cluster needed to be configured to resolve names using the data plane.
      *result* - The `oc import-image` command works with the private registry.
      Show
      *cause* - A customer is trying to import an image in a hosted cluster in which only the workers have access to the image registry. *consequence* - The `oc import-image` command fails because the registry cannot be reached. *fix* - The proxy used by the openshift-apiserver in the management cluster needed to be configured to resolve names using the data plane. *result* - The `oc import-image` command works with the private registry.
    • Bug Fix
    • In Progress

      Context
      Some ROSA HCP users host their own container registries (e.g., self-hosted Quay servers) that are only accessible from inside of their VPCs. This is often achieved through the use of private DNS zones that resolve non-public domains like quay.mycompany.intranet to non-public IP addresses. The private registries at those addresses then present self-signed SSL certificates to the client that can be validated against the HCP's additional CA trust bundle.

      Problem Description
      A user of a ROSA HCP cluster with a configuration like the one described above is encountering errors when attempting to import a container image from their private registry into their HCP's internal registry via oc import-image. Originally, these errors showed up in openshift-apiserver logs as DNS resolution errors, i.e., OCPBUGS-36944. After the user upgraded their cluster to 4.14.37 (which fixes OCPBUGS-36944), openshift-apiserver was able to properly resolve the domain name but complains of HTTP 502 Bad Gateway errors. We suspect these 502 Bad Gateway errors are coming from the Konnectivity-agent while it proxies traffic between the control and data planes.

      We've confirmed that the private registry is accessible from the HCP data plane (worker nodes) and that the certificate presented by the registry can be validated against the cluster's additional trust bundle. IOW, curl-ing the private registry from a worker node returns a HTTP 200 OK, but doing the same from a control plane node returns a HTTP 502. Notably, this cluster is not configured with a cluster-wide proxy, nor does the user's VPC feature a transparent proxy.

      Version-Release number of selected component
      OCP v4.14.37

      How reproducible
      Can be reliably reproduced, although the network config (see Context above) is quite specific

      Steps to Reproduce

      1. Run the following command from the HCP data plane
        oc import-image imagegroup/imagename:v1.2.3 --from=quay.mycompany.intranet/imagegroup/imagename:v1.2.3 --confirm
        
      2. Observe the command output, the resulting ImageStream object, and openshift-apiserver logs

      Actual Results

      error: tag v1.2.3 failed: Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
      imagestream.image.openshift.io/imagename imported with errors
      
      Name:            imagename
      Namespace:        mynamespace
      Created:        Less than a second ago
      Labels:            <none>
      Annotations:        openshift.io/image.dockerRepositoryCheck=2024-10-01T12:46:02Z
      Image Repository:    default-route-openshift-image-registry.apps.rosa.clustername.abcd.p1.openshiftapps.com/mynamespace/imagename
      Image Lookup:        local=false
      Unique Images:        0
      Tags:            1
      
      v1.2.3
        tagged from quay.mycompany.intranet/imagegroup/imagename:v1.2.3
      
        ! error: Import failed (InternalError): Internal error occurred: quay.mycompany.intranet/imagegroup/imagename:v1.2.3: Get "https://quay.mycompany.intranet/v2/": Bad Gateway
            Less than a second ago
      
      error: imported completed with errors
      

      Expected Results
      Desired container image is imported from private external image registry into cluster's internal image registry without error

              cewong@redhat.com Cesar Wong
              abyrne.openshift Anthony Byrne
              Jie Zhao Jie Zhao
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: