-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.13.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While installing 4.13 from nightly build using below cgroup v2 config in manifest directory, installation is failing.
$ cat cluster-cgroup-v2.yaml
apiVersion: config.openshift.io/v1
kind: Node
metadata:
name: cluster
spec:
cgroupMode: "v2"
Version-Release number of selected component (if applicable):
4.13
How reproducible:
Always
Steps to Reproduce:
1. Install 4.13 cluster by providing below cgroup v2 config in manifest directory.
$ cat cluster-cgroup-v2.yaml
apiVersion: config.openshift.io/v1
kind: Node
metadata:
name: cluster
spec:
cgroupMode: "v2"
Actual results:
% ./openshift-install create cluster
INFO Consuming Common Manifests from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Consuming Master Machines from target directory
INFO Consuming Worker Machines from target directory
INFO Credentials loaded from the "default" profile in file "/Users/sunilc/.aws/credentials"
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 3:44PM) for the Kubernetes API at https://api.sunilc413c.qe.devcluster.openshift.com:6443...
INFO API v1.26.2+7195e44 up
INFO Waiting up to 30m0s (until 3:55PM) for bootstrapping to complete...
INFO Pulling VM console logs
INFO Pulling debug logs from the bootstrap machine
ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: failed to use pre-existing agent, make sure the appropriate keys exist in the agent for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server
ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.200.39:443/healthz": dial tcp 172.30.200.39:443: connect: connection refused
ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready
ERROR Cluster operator authentication Available is False with APIServices_Error::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound: APIServicesAvailable: "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "user.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.200.39:443/healthz": dial tcp 172.30.200.39:443: connect: connection refused
ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found
INFO Cluster operator baremetal Disabled is False with :
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required
ERROR Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.
ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
INFO Cluster operator ingress EvaluationConditionsDetected is False with AsExpected:
INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer
INFO Cluster operator insights Disabled is False with AsExpected:
INFO Cluster operator insights SCAAvailable is False with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"code":"ACCT-MGMT-7","href":"/api/accounts_mgmt/v1/errors/7","id":"7","kind":"Error","operation_id":"33c8ee81-16c0-4852-b0ca-e95603046f4e","reason":"The organization (id= 24JqmM1UPqhIJcevZ4GrFb6ocrD) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management."}
ERROR Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s
ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
INFO Cluster operator network ManagementStateDegraded is False with :
ERROR Cluster operator openshift-apiserver Available is False with APIServices_Error: APIServicesAvailable: "apps.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "authorization.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "build.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "image.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "project.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "quota.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "route.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "security.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR APIServicesAvailable: "template.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request
ERROR Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout
INFO Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.19.0
ERROR Bootstrap failed to complete: timed out waiting for the condition
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
Expected results:
Installation should succeed
Additional info: