Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61407

OAuth Route not working using PublicAndPrivate access endpoint and no hostname defined

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.14.z, 4.15.z, 4.17.z, 4.16.z, 4.18.z, 4.19.z
    • HyperShift
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • In Progress
    • Bug Fix
    • Hide
      *Cause*: When the Kube API server service publishing strategy is LoadBalancer with PublicAndPrivate endpoint access, the OAuth Route is addmitted by a private router even though a DNS record is not registered by external-dns.
      *Consequence*: The Route URL is not properly resolved by the private router and makes the OAuth Server inaccessible. The console cluster operator also fails to start and the hosted cluster installation fails.
      *Fix*: The OAuth Route is only admitted by the private router only when external-dns is defined. Otherwise the Route is admitted by the router in the management cluster.
      *Result*: The OAuth Route is accessible. The console cluster operator properly starts and hosted cluster installation succeeds.
      Show
      *Cause*: When the Kube API server service publishing strategy is LoadBalancer with PublicAndPrivate endpoint access, the OAuth Route is addmitted by a private router even though a DNS record is not registered by external-dns. *Consequence*: The Route URL is not properly resolved by the private router and makes the OAuth Server inaccessible. The console cluster operator also fails to start and the hosted cluster installation fails. *Fix*: The OAuth Route is only admitted by the private router only when external-dns is defined. Otherwise the Route is admitted by the router in the management cluster. *Result*: The OAuth Route is accessible. The console cluster operator properly starts and hosted cluster installation succeeds.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-56914. The following is the description of the original issue:

      Description of problem:

      This is similar to HOSTEDCP-672 but the hostname for OAuth is not defined in the current config.

      Create a HostedCluster on AWS with "PublicAndPrivate" endpoint access and the following configuration:
      The oauth route should be externally accessible, however it's not exposed.

        services:
        - service: APIServer
          servicePublishingStrategy:
            type: LoadBalancer
        - service: OAuthServer
          servicePublishingStrategy:
            type: Route
        - service: Konnectivity
          servicePublishingStrategy:
            type: Route
        - service: Ignition
          servicePublishingStrategy:
            type: Route
      

      There are a couple of issues:

      1) The OAuth public route is admitted by the "private" router but the Route doesn't get routerCanonicalHostname in the status. This was earlier resolved in HOSTEDCP-672 but got reintroduced later after some refactorings.

      2) The public route is admitted by the private router even though the DNS record is not registered in external-dns (in hypershift namespace) so the URL is not properly resolved by the private router. The route URL (which looks like this: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com) can only be routed by the router from the management cluster. There's an A DNS record in AWS "*apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com" which points to the default router of the management cluster. The route should be admitted by the private router only when external-dns is used (i.e. hostname is defined for the OAuth route in the HC config)

      Version-Release number of selected component (if applicable):

          4.19

      How reproducible:

          Always

      Steps to Reproduce:

          1.Create a HostedCluster with the configuration above    

      Actual results:

      The console ClusterOperator gets stuck:

          ᐅ oc get co console                  
      NAME      VERSION                                    AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      console   4.19.0-0.nightly-multi-2025-05-27-033815   False       True          True       7m33s   DeploymentAvailable: 0 replicas available for console deployment...

      The console pod in openshift-console is not running:

      ᐅ oc get pods -n openshift-console
      NAME                         READY   STATUS    RESTARTS      AGE
      console-645598cb65-ln549     0/1     Running   1 (87s ago)   6m28s
      console-6c4bf8cc5c-ttxrm     0/1     Running   0             3m50s 

      It logs the following error:

      W0527 10:01:44.477840       1 authoptions.go:112] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
      I0527 10:01:44.518839       1 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
      I0527 10:01:44.518862       1 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
      I0527 10:01:44.518869       1 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
      I0527 10:01:44.518876       1 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
      F0527 10:06:44.519958       1 authoptions.go:332] Error initializing authenticator: failed to construct OAuth endpoint cache: failed to setup an async cache - caching func returned error: context deadline exceeded 

      The console fails to access the OAuth route, checked this manually:

      ᐅ curl -v -k https://oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
      * Host oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com:443 was resolved.
      * IPv6: (none)
      * IPv4: 44.208.177.198, 52.45.137.74
      *   Trying 44.208.177.198:443...
      * Connected to oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com (44.208.177.198) port 443
      * ALPN: curl offers h2,http/1.1
      * TLSv1.3 (OUT), TLS handshake, Client hello (1):
      * TLSv1.3 (IN), TLS handshake, Server hello (2):
      * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
      * TLSv1.3 (IN), TLS handshake, Certificate (11):
      * TLSv1.3 (IN), TLS handshake, CERT verify (15):
      * TLSv1.3 (IN), TLS handshake, Finished (20):
      * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
      * TLSv1.3 (OUT), TLS handshake, Finished (20):
      * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / RSASSA-PSS
      * ALPN: server did not agree on a protocol. Uses default.
      * Server certificate:
      *  subject: O=openshift; CN=openshift-ingress
      *  start date: May 29 10:32:56 2025 GMT
      *  expire date: May 29 10:32:56 2026 GMT
      *  issuer: OU=openshift; CN=root-ca
      *  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
      *   Certificate level 0: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
      * using HTTP/1.x
      > GET / HTTP/1.1
      > Host: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
      > User-Agent: curl/8.6.0
      > Accept: */*
      > 
      * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
      * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
      * old SSL session ID is stale, removing
      * HTTP 1.0, assume close after body
      < HTTP/1.0 503 Service Unavailable 

      The route looks like this:

      ᐅ oc get route oauth -oyaml
      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        annotations:
          openshift.io/host.generated: "true"
        creationTimestamp: "2025-05-29T11:01:10Z"
        labels:
          hypershift.openshift.io/hosted-control-plane: clusters-hc1
        name: oauth
        namespace: clusters-hc1
        ownerReferences:
        - apiVersion: hypershift.openshift.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: HostedControlPlane
          name: hc1
          uid: 3b4bd993-c269-4a46-a36b-909aaed2a1af
        resourceVersion: "16805"
        uid: 19457574-f5b0-4626-854f-2b2219a67dfa
      spec:
        host: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
        tls:
          insecureEdgeTerminationPolicy: None
          termination: passthrough
        to:
          kind: Service
          name: oauth-openshift
          weight: 100
        wildcardPolicy: None
      status:
        ingress:
        - conditions:
          - lastTransitionTime: "2025-05-29T11:01:35Z"
            status: "True"
            type: Admitted
          host: oauth-clusters-hc1.apps.mgencur-mgmt.mgencur.hypershift.devcluster.openshift.com
          routerName: router
          wildcardPolicy: None 

      The service is there:

      ᐅ oc get svc oauth-openshift -oyaml                                                     
      apiVersion: v1
      kind: Service
      metadata:
        creationTimestamp: "2025-05-29T11:01:10Z"
        name: oauth-openshift
        namespace: clusters-hc1
        ownerReferences:
        - apiVersion: hypershift.openshift.io/v1beta1
          blockOwnerDeletion: true
          controller: true
          kind: HostedControlPlane
          name: hc1
          uid: 3b4bd993-c269-4a46-a36b-909aaed2a1af
        resourceVersion: "16565"
        uid: 491e8c75-bf05-4737-8a58-6f608b73f302
      spec:
        clusterIP: 172.31.190.108
        clusterIPs:
        - 172.31.190.108
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: PreferDualStack
        ports:
        - port: 6443
          protocol: TCP
          targetPort: 6443
        selector:
          app: oauth-openshift
          hypershift.openshift.io/control-plane-component: oauth-openshift
        sessionAffinity: None
        type: ClusterIP
      status:
        loadBalancer: {} 

      Accessing the Service from a sibling Pod works (see below). It just doesn't work when using the Route.

      Calling curl from a sibling Pod can reach the Pod.

      /home/curl_user $ curl -k https://172.31.190.108:6443
      {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
        "reason": "Forbidden",
        "details": {},
        "code": 403
      } 

      Expected results:

          The hosted cluster starts successfully

      Additional info:

          

              mgencur@redhat.com Martin Gencur
              mgencur@redhat.com Martin Gencur
              None
              None
              Martin Gencur Martin Gencur
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: