Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30235

OCP 4.16 installation fails because of MCO degradation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • 4.16
    • None
    • Important
    • No
    • Proposed
    • s390x
    • False
    • Hide

      None

      Show
      None

    Description

      OCP 4.16 Installation is failing - Bootstrap failing due to access denied to perform create on Certificate with HTTP post

      level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"d39d6133-da19-4ac0-bdc2-9af0d255f4f3","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}
      
          

      After 40 mins of wait for cluster to initialize.. there were multiple errors like below operators and shows the server is down or not responding

      level=debug msg=Loading Agent Config...
      level=info msg=Waiting up to 40m0s (until 2:15PM UTC) for the cluster at https://api.libvirt-s390x-4-1-17572.libvirt-s390x-4-1.ci:6443 to initialize...
      level=debug msg=Still waiting for the cluster to initialize: Multiple errors are preventing progress:
      level=debug msg=* Cluster operators authentication, image-registry, ingress, insights, kube-apiserver, machine-api, machine-config, monitoring, openshift-apiserver, openshift-controller-manager, openshift-samples, operator-lifecycle-manager-packageserver are not available
      level=debug msg=* Could not update imagestream "openshift/driver-toolkit" (608 of 886): the server is down or not responding
      level=debug msg=* Could not update oauthclient "console" (546 of 886): the server does not recognize this resource, check extension API servers
      level=debug msg=* Could not update role "openshift-console-operator/prometheus-k8s" (804 of 886): resource may have been deleted
      level=debug msg=* Could not update role "openshift-console/prometheus-k8s" (808 of 886): resource may have been deleted
          

      which caused error with machine config degraded and master nodes failed.

      level=error msg=Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Unable to apply 4.16.0-0.nightly-s390x-2024-03-03-131239: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)]]
          

      Job Link: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-ovn-remote-libvirt-s390x/1764278211769798656

      Issue is observed in last 9 jobs runs since March 2nd.

      FYI
      Checked with telpelt on this VM during the same time duration, there were no network glitch or infrastructure problems.

      Attachments

        Activity

          People

            rhn-engineering-skumari Sinny Kumari
            apuranda Amrut Purandare
            Doug Slavens Doug Slavens
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: