-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
premerge
-
None
-
Critical
-
None
-
Approved
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
This is a bug found during pre-merge test of 4.18 epic AUTH-528 PRs and filed for better tracking per existing "OpenShift - Testing Before PR Merges - Left-Shift Testing" google doc workflow.
KAS rollout stuck at CrashLoopBackOff in https proxy env when OCP BYO external OIDC is configured with issuerCertificateAuthority set. "https proxy env" means an env that sets https proxy and CA configmap in "oc get proxy cluster -o yaml". Note: in normal OCP env that does not set https proxy and configures BYO external oidc with issuerCertificateAuthority field, KAS rollout can complete successfully.
Version-Release number of selected component (if applicable):
The cluster-bot build that is built at 2024-11-15 10:37 CST (UTC+800): build openshift/cluster-authentication-operator#713,openshift/cluster-kube-apiserver-operator#1760
How reproducible:
Should be always because tried twice, both can hit it.
Steps to Reproduce:
1. Launch an https proxy cluster with TechPreviewNoUpgrade turned on using above cluster-bot build. 2. Set up one self-signed external oidc (e.g. using keycloak). In keycloak admin UI, create a confidential client "console-test" with right redirect callback URI set and a public client "oc-cli-test" and create a test user. 3. Configure OCP BYO external OIDC: ISSUER_URL=$KEYCLOAK_HOST/realms/master CONSOLE_CLIENT_ID=console-test CONSOLE_CLIENT_SECRET_VALUE="xxxxxxxx" CONSOLE_CLIENT_SECRET_NAME=console-secret CLI_CLIENT_ID=oc-cli-test AUDIENCE_1=$CONSOLE_CLIENT_ID AUDIENCE_2=$CLI_CLIENT_ID $ curl -sSI --cacert router-ca/ca-bundle.crt $KEYCLOAK_HOST/realms/master/.well-known/openid-configuration | head -n 1 HTTP/1.1 200 OK oc create configmap keycloak-oidc-ca --from-file=ca-bundle.crt=router-ca/ca-bundle.crt -n openshift-config oc create secret generic $CONSOLE_CLIENT_SECRET_NAME --from-literal=clientSecret=$CONSOLE_CLIENT_SECRET_VALUE -n openshift-config oc patch authentication.config/cluster --type=merge -p=" spec: oidcProviders: - claimMappings: groups: claim: groups prefix: 'oidc-groups-test:' username: claim: email prefixPolicy: Prefix prefix: prefixString: 'oidc-user-test:' issuer: audiences: - $AUDIENCE_1 - $AUDIENCE_2 issuerCertificateAuthority: name: keycloak-oidc-ca issuerURL: $ISSUER_URL name: keycloak-oidc-server oidcClients: - clientID: $CONSOLE_CLIENT_ID clientSecret: name: $CONSOLE_CLIENT_SECRET_NAME componentName: console componentNamespace: openshift-console type: OIDC webhookTokenAuthenticator: null " 4. Wait KAS pods to finish rollout. 5. Wait console pods to finish rollout. 6. Check co.
Actual results:
Step 4:
KAS pods started to roll out but stuck at CrashLoopBackOff:
$ oc get po -n openshift-kube-apiserver -L revision -l apiserver NAME READY STATUS RESTARTS AGE REVISION kube-apiserver-xxia-hsprox-2-7njpg-control-plane-0 3/5 CrashLoopBackOff .. (22s ago) 10m 9 kube-apiserver-xxia-hsprox-2-7njpg-control-plane-1 5/5 Running . 53m 8 kube-apiserver-xxia-hsprox-2-7njpg-control-plane-2 5/5 Running .. 50m 8 Step 5: In the first https proxy env's test trial, console pods did not crash but did not roll out at all, which is separate issue tracked by OCPBUGS-44556. In the second https proxy env's test trial, console pods crashed too: $ oc get pods -n openshift-console NAME READY STATUS RESTARTS AGE console-846b86d967-8z25q 0/1 Running 3 (73s ago) 16m console-846b86d967-9kh5f 0/1 Running 3 (99s ago) 16m Step 6: $ oc get co | grep -v 'True.*False.*False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.18.0-0.ci.test-2024-11-15-025507-ci-ln-cjc0zl2-latest True False True 4h50m ExternalOIDCControllerDegraded: auth config validation failed: [could not validate IDP URL using the system CAs: GET well-known error: Get "https://keycloak-keycloak.apps.xxxxxxxx/realms/master/.well-known/openid-configuration": context deadline exceeded (Client.Timeout exceeded while awaiting headers)] console 4.18.0-0.ci.test-2024-11-15-025507-ci-ln-cjc0zl2-latest True True False 4h56m SyncLoopRefreshProgressing: working toward version 4.18.0-0.ci.test-2024-11-15-025507-ci-ln-cjc0zl2-latest, 1 replicas available kube-apiserver 4.18.0-0.ci.test-2024-11-15-025507-ci-ln-cjc0zl2-latest True True True 5h9m ConfigObservationDegraded: failed to get configmap openshift-config-managed/auth-config: configmap "auth-config" not found...
Expected results:
KAS/console/authentication should not have above issues in step 4 ~ 6.
Additional info: