Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11583

Installation is failing when providing cgroup v2 config in manifest

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Undefined
    • None
    • 4.13.0
    • Node / Kubelet
    • None
    • No
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      While installing 4.13 from nightly build using below cgroup v2 config in manifest directory, installation is failing.
      
      $ cat cluster-cgroup-v2.yaml
      apiVersion: config.openshift.io/v1
      kind: Node
      metadata:
        name: cluster
        spec:
          cgroupMode: "v2"

      Version-Release number of selected component (if applicable):

      4.13

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install 4.13 cluster by providing below cgroup v2 config in manifest directory.
      
      $ cat cluster-cgroup-v2.yaml
      apiVersion: config.openshift.io/v1
      kind: Node
      metadata:
        name: cluster
        spec:
          cgroupMode: "v2"  

      Actual results:

      % ./openshift-install create cluster
      INFO Consuming Common Manifests from target directory 
      INFO Consuming Openshift Manifests from target directory 
      INFO Consuming OpenShift Install (Manifests) from target directory 
      INFO Consuming Master Machines from target directory 
      INFO Consuming Worker Machines from target directory 
      INFO Credentials loaded from the "default" profile in file "/Users/sunilc/.aws/credentials" 
      INFO Creating infrastructure resources...         
      INFO Waiting up to 20m0s (until 3:44PM) for the Kubernetes API at https://api.sunilc413c.qe.devcluster.openshift.com:6443... 
      INFO API v1.26.2+7195e44 up                       
      INFO Waiting up to 30m0s (until 3:55PM) for bootstrapping to complete... 
      INFO Pulling VM console logs                      
      INFO Pulling debug logs from the bootstrap machine 
      ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: failed to use pre-existing agent, make sure the appropriate keys exist in the agent for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 
      ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server 
      ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.200.39:443/healthz": dial tcp 172.30.200.39:443: connect: connection refused 
      ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready 
      ERROR Cluster operator authentication Available is False with APIServices_Error::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::OAuthServerServiceEndpointsEndpointAccessibleController_ResourceNotFound: APIServicesAvailable: "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "user.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.200.39:443/healthz": dial tcp 172.30.200.39:443: connect: connection refused 
      ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerAvailable: endpoints "oauth-openshift" not found 
      INFO Cluster operator baremetal Disabled is False with :  
      INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected 
      INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected 
      INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected 
      INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected 
      INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required 
      ERROR Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending) 
      INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. 
      ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending), CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) 
      INFO Cluster operator ingress EvaluationConditionsDetected is False with AsExpected:  
      INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer 
      INFO Cluster operator insights Disabled is False with AsExpected:  
      INFO Cluster operator insights SCAAvailable is False with NotFound: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 404: {"code":"ACCT-MGMT-7","href":"/api/accounts_mgmt/v1/errors/7","id":"7","kind":"Error","operation_id":"33c8ee81-16c0-4852-b0ca-e95603046f4e","reason":"The organization (id= 24JqmM1UPqhIJcevZ4GrFb6ocrD) does not have any certificate of type sca. Enable SCA at https://access.redhat.com/management."} 
      ERROR Cluster operator monitoring Available is False with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s 
      ERROR Cluster operator monitoring Degraded is True with UpdatingPrometheusOperatorFailed: reconciling Prometheus Operator Deployment failed: creating Deployment object failed: Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s 
      INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
      INFO Cluster operator network ManagementStateDegraded is False with :  
      ERROR Cluster operator openshift-apiserver Available is False with APIServices_Error: APIServicesAvailable: "apps.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "authorization.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "build.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "image.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "project.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "quota.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "route.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "security.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR APIServicesAvailable: "template.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request 
      ERROR Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout 
      INFO Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.19.0 
      ERROR Bootstrap failed to complete: timed out waiting for the condition 
      ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane. 

      Expected results:

      Installation should succeed 

      Additional info:

       

      Attachments

        Activity

          People

            svanka@redhat.com Sai Ramesh Vanka
            schoudha Sunil Choudhary
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: