Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56914

OAuth Route not working using PublicAndPrivate access endpoint and no hostname defined

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.20.0
    • 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z, 4.19.z
    • HyperShift
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • Done
    • Bug Fix
    • Hide
      Before this update, if you tried to create a hosted cluster on AWS when the Kubernetes API server service publishing strategy was `LoadBalancer` with `PublicAndPrivate` endpoint access, a private router admitted the OAuth route even though the External DNS Operator did not register a DNS record. As a consequence, the private router did not properly resolve the route URL and the OAuth server was inaccessible. The Console Cluster Operator also failed to start, and the hosted cluster installation failed. With this release, a private router admits the OAuth route only when the external DNS is defined. Otherwise, the router admits the route in the management cluster. As a result, the OAuth route is accessible, the Console Cluster Operator properly starts, and the hosted cluster installation succeeds.
      Show
      Before this update, if you tried to create a hosted cluster on AWS when the Kubernetes API server service publishing strategy was `LoadBalancer` with `PublicAndPrivate` endpoint access, a private router admitted the OAuth route even though the External DNS Operator did not register a DNS record. As a consequence, the private router did not properly resolve the route URL and the OAuth server was inaccessible. The Console Cluster Operator also failed to start, and the hosted cluster installation failed. With this release, a private router admits the OAuth route only when the external DNS is defined. Otherwise, the router admits the route in the management cluster. As a result, the OAuth route is accessible, the Console Cluster Operator properly starts, and the hosted cluster installation succeeds.
    • None
    • None
    • None
    • None

      Description of problem:

      This is similar to HOSTEDCP-672 but the hostname for OAuth is not defined in the current config.

      Create a HostedCluster on AWS with "PublicAndPrivate" endpoint access and the following configuration:
      The oauth route should be externally accessible, however it's not exposed.

        services:
        - service: APIServer
          servicePublishingStrategy:
            type: LoadBalancer
        - service: OAuthServer
          servicePublishingStrategy:
            type: Route
        - service: Konnectivity
          servicePublishingStrategy:
            type: Route
        - service: Ignition
          servicePublishingStrategy:
            type: Route
      

      There are a couple of issues:

      1) The OAuth public route is admitted by the "private" router but the Route doesn't get routerCanonicalHostname in the status. This was earlier resolved in HOSTEDCP-672 but got reintroduced later after some refactorings.

      2) The public route is admitted by the private router even though the DNS record is not registered in external-dns (in hypershift namespace) so the URL is not properly resolved by the private router. The route URL (which looks like this: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com) can only be routed by the router from the management cluster. There's an A DNS record in AWS "*apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com" which points to the default router of the management cluster. The route should be admitted by the private router only when external-dns is used (i.e. hostname is defined for the OAuth route in the HC config)

      Version-Release number of selected component (if applicable):

          4.19

      How reproducible:

          Always

      Steps to Reproduce:

          1.Create a HostedCluster with the configuration above    

      Actual results:

      The console ClusterOperator gets stuck:

          ᐅ oc get co console                  
      NAME      VERSION                                    AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console   4.19.0-0.nightly-multi-2025-05-27-033815   False       True          True       7m33s   DeploymentAvailable: 0 replicas available for console deployment...

      The console pod in openshift-console is not running:

      ᐅ oc get pods -n openshift-console
      NAME                         READY   STATUS    RESTARTS      AGE
      console-645598cb65-ln549     0/1     Running   1 (87s ago)   6m28s
      console-6c4bf8cc5c-ttxrm     0/1     Running   0             3m50s 

      It logs the following error:

      W0527 10:01:44.477840       1 authoptions.go:112] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
      I0527 10:01:44.518839       1 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
      I0527 10:01:44.518862       1 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
      I0527 10:01:44.518869       1 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
      I0527 10:01:44.518876       1 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
      F0527 10:06:44.519958       1 authoptions.go:332] Error initializing authenticator: failed to construct OAuth endpoint cache: failed to setup an async cache - caching func returned error: context deadline exceeded 

      The console fails to access the OAuth route, checked this manually:

      ᐅ curl -v -k https://oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
      * Host oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com:443 was resolved.
      * IPv6: (none)
      * IPv4: 44.208.177.198, 52.45.137.74
      *   Trying 44.208.177.198:443...
      * Connected to oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com (44.208.177.198) port 443
      * ALPN: curl offers h2,http/1.1
      * TLSv1.3 (OUT), TLS handshake, Client hello (1):
      * TLSv1.3 (IN), TLS handshake, Server hello (2):
      * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
      * TLSv1.3 (IN), TLS handshake, Certificate (11):
      * TLSv1.3 (IN), TLS handshake, CERT verify (15):
      * TLSv1.3 (IN), TLS handshake, Finished (20):
      * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
      * TLSv1.3 (OUT), TLS handshake, Finished (20):
      * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / RSASSA-PSS
      * ALPN: server did not agree on a protocol. Uses default.
      * Server certificate:
      *  subject: O=openshift; CN=openshift-ingress
      *  start date: May 29 10:32:56 2025 GMT
      *  expire date: May 29 10:32:56 2026 GMT
      *  issuer: OU=openshift; CN=root-ca
      *  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
      *   Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
      * using HTTP/1.x
      > GET / HTTP/1.1
      > Host: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
      > User-Agent: curl/8.6.0
      > Accept: */*
      > 
      * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
      * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
      * old SSL session ID is stale, removing
      * HTTP 1.0, assume close after body
      < HTTP/1.0 503 Service Unavailable 

      The route looks like this:

      ᐅ oc get route oauth -oyaml
      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        annotations:
          openshift.io/host.generated: "true"
        creationTimestamp: "2025-05-29T11:01:10Z"
        labels:
          hypershift.openshift.io/hosted-control-plane: clusters-hc1
        name: oauth
        namespace: clusters-hc1
        ownerReferences:
        - apiVersion: hypershift.openshift.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: HostedControlPlane
          name: hc1
          uid: 3b4bd993-c269-4a46-a36b-909aaed2a1af
        resourceVersion: "16805"
        uid: 19457574-f5b0-4626-854f-2b2219a67dfa
      spec:
        host: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
        tls:
          insecureEdgeTerminationPolicy: None
          termination: passthrough
        to:
          kind: Service
          name: oauth-openshift
          weight: 100
        wildcardPolicy: None
      status:
        ingress:
        - conditions:
          - lastTransitionTime: "2025-05-29T11:01:35Z"
            status: "True"
            type: Admitted
          host: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
          routerName: router
          wildcardPolicy: None 

      The service is there:

      ᐅ oc get svc oauth-openshift -oyaml                                                     
      apiVersion: v1
      kind: Service
      metadata:
        creationTimestamp: "2025-05-29T11:01:10Z"
        name: oauth-openshift
        namespace: clusters-hc1
        ownerReferences:
        - apiVersion: hypershift.openshift.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: HostedControlPlane
          name: hc1
          uid: 3b4bd993-c269-4a46-a36b-909aaed2a1af
        resourceVersion: "16565"
        uid: 491e8c75-bf05-4737-8a58-6f608b73f302
      spec:
        clusterIP: 172.31.190.108
        clusterIPs:
        - 172.31.190.108
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: PreferDualStack
        ports:
        - port: 6443
          protocol: TCP
          targetPort: 6443
        selector:
          app: oauth-openshift
          hypershift.openshift.io/control-plane-component: oauth-openshift
        sessionAffinity: None
        type: ClusterIP
      status:
        loadBalancer: {} 

      Accessing the Service from a sibling Pod works (see below). It just doesn't work when using the Route.

      Calling curl from a sibling Pod can reach the Pod.

      /home/curl_user $ curl -k https://172.31.190.108:6443
      {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
        "reason": "Forbidden",
        "details": {},
        "code": 403
      } 

      Expected results:

          The hosted cluster starts successfully

      Additional info:

          

              mgencur@redhat.com Martin Gencur
              mgencur@redhat.com Martin Gencur
              None
              None
              Martin Gencur Martin Gencur
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: