Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-50606

[4.19 HCP only] HyperShift hosted cluster kubeadmin login always fails with "Login failed (401 Unauthorized)"

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.19.0
    • HyperShift
    • Critical
    • Yes
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In 4.19 HyperShift hosted cluster, kubeadmin login always fails.
      4.18 HyperShift hosted cluster (MGTM cluster and hosted cluster both are 4.18) doesn't have the issue.
      

      Version-Release number of selected component (if applicable):

      MGMT cluster version and hosted cluster version both are 4.19.0-0.nightly-2025-02-11-161912
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Launch 4.19 HyperShift management cluster and a hosted cluster on it.
      2. Run kubeadmin login against HCP:
      $ export KUBECONFIG=/path/to/mgmt/kubeconfig
      $ oc get secret kubeadmin-password -n clusters-hypershift-ci-334742 -o 'jsonpath={ .data.password }' | base64 -d
      WJt9r-xxxxx-xxxxx-fpAMT
      
      $ export KUBECONFIG=/path/to/hosted-cluster/kubeconfig
      $ oc login -u kubeadmin -p "WJt9r-xxxxx-xxxxx-fpAMT"
      Login failed (401 Unauthorized)
      Verify you have provided the correct credentials.
      

      Actual results:

      HyperShift hosted cluster kubeadmin login always fails.
      

      Expected results:

      Success.
      

      Additional info:

      If I then configured htpasswd IDP for the hosted cluster, htpasswd user can login successfully.
      

            [OCPBUGS-50606] [4.19 HCP only] HyperShift hosted cluster kubeadmin login always fails with "Login failed (401 Unauthorized)"

            Verified in  management and hosted cluster envs of version 4.19.0-0.nightly-2025-02-23-235415. Now kubeadmin login succeeds in the hosted cluster.

            Xingxing Xia added a comment - Verified in  management and hosted cluster envs of version 4.19.0-0.nightly-2025-02-23-235415. Now kubeadmin login succeeds in the hosted cluster.

            Hi rh-ee-mraee,

            Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            OpenShift Jira Bot added a comment - Hi rh-ee-mraee , Bugs should not be moved to Verified without first providing a Release Note Type("Bug Fix" or "No Doc Update") and for type "Bug Fix" the Release Note Text must also be provided. Please populate the necessary fields before moving the Bug to Verified.

            https://github.com/openshift/hypershift/pull/5286 is the source of the regression.

            In 4.18, the CPO creates the oauth-openshift deployment and the HCCO. The HCCO proceeds to reconcile the kubeadmin secret into the guest cluster then annotates the oauth-openshift deployment pod template with the hypershift.openshift.io/kubeadmin-secret-hash, resulting in a rollout of the oauth-openshift deployment on the HCP after the kubeadmin secret is created in the guest cluster.

            In 4.19, the CPO creates the oauth-openshift deployment that initially includes the hypershift.openshift.io/kubeadmin-secret-hash annotation already, thus there is no rollout of the oauth-openshift deployment after the kubeadmin secret is created in the guest cluster and the kubeadmin user is disabled.

            Seth Jennings added a comment - https://github.com/openshift/hypershift/pull/5286 is the source of the regression. In 4.18, the CPO creates the oauth-openshift deployment and the HCCO. The HCCO proceeds to reconcile the kubeadmin secret into the guest cluster then annotates the oauth-openshift deployment pod template with the hypershift.openshift.io/kubeadmin-secret-hash , resulting in a rollout of the oauth-openshift deployment on the HCP after the kubeadmin secret is created in the guest cluster. In 4.19, the CPO creates the oauth-openshift deployment that initially includes the hypershift.openshift.io/kubeadmin-secret-hash annotation already, thus there is no rollout of the oauth-openshift deployment after the kubeadmin secret is created in the guest cluster and the kubeadmin user is disabled.

            In standalone, the cluster-authentication-operator watches the kube-system/kubeadmin secret and redeploys the oauth-server in the event that the kubeadmin user is added (within one hour of cluster creation) or removed.

            https://github.com/search?q=org%3Aopenshift%20NewBootstrapUserDataGetter&type=code

            https://github.com/openshift/cluster-authentication-operator/blob/11b2201203fd5ba0dec937015fa542f4cd4c8879/pkg/controllers/deployment/deployment_controller.go#L100-L105

            IsEnabled() returns false, nil when secret creation timestamp is after kube-system namespace creation time +1h

            https://github.com/openshift/library-go/blob/80620876b7c2dbd5def6da140ad044e6f3de98b5/pkg/authentication/bootstrapauthenticator/bootstrap.go#L175-L177

            This is likely a race between the oauth-server coming up in the HCP and the HCCO reconciling the kube-syste/kubeadmin secret into the guest cluster. The oauth-server does not watch the kubeadmin secret, thus, if it is created after initial start, the kubeadmin identity provider is not enabled. My theory is that some timing changed in 4.19 that makes this issue more pronounced.

            Seth Jennings added a comment - In standalone, the cluster-authentication-operator watches the kube-system/kubeadmin secret and redeploys the oauth-server in the event that the kubeadmin user is added (within one hour of cluster creation) or removed. https://github.com/search?q=org%3Aopenshift%20NewBootstrapUserDataGetter&type=code https://github.com/openshift/cluster-authentication-operator/blob/11b2201203fd5ba0dec937015fa542f4cd4c8879/pkg/controllers/deployment/deployment_controller.go#L100-L105 IsEnabled() returns false, nil when secret creation timestamp is after kube-system namespace creation time +1h https://github.com/openshift/library-go/blob/80620876b7c2dbd5def6da140ad044e6f3de98b5/pkg/authentication/bootstrapauthenticator/bootstrap.go#L175-L177 This is likely a race between the oauth-server coming up in the HCP and the HCCO reconciling the kube-syste/kubeadmin secret into the guest cluster. The oauth-server does not watch the kubeadmin secret, thus, if it is created after initial start, the kubeadmin identity provider is not enabled. My theory is that some timing changed in 4.19 that makes this issue more pronounced.

            I verified that the kubeadmin secret in the guest cluster kube-system namespace contains a valid bcrypt hash of the kubeadmin password contained in the kubeadmin-password secret in the mgmt cluster HCP namespace, but was still unable to login.

            However, after restarting the oauth-openshift pod, I was able to login. I'll investigate more tomorrow.

            Seth Jennings added a comment - I verified that the kubeadmin secret in the guest cluster kube-system namespace contains a valid bcrypt hash of the kubeadmin password contained in the kubeadmin-password secret in the mgmt cluster HCP namespace, but was still unable to login. However, after restarting the oauth-openshift pod, I was able to login. I'll investigate more tomorrow.

            xxia-1 do we know when kubeadmin login started failing for 4.19? i.e. what was the last known good nightly build?

            Seth Jennings added a comment - xxia-1 do we know when kubeadmin login started failing for 4.19? i.e. what was the last known good nightly build?

              rh-ee-mraee Mulham Raee
              xxia-1 Xingxing Xia
              Xingxing Xia Xingxing Xia
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: